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INTRODUCTION 


African  Americans  (AAs)  have  a  greater  incidence  of  prostate  cancer  (PrCa)  than  European  Americans  (EAs)  and  their 
PrCas  tend  to  be  more  aggressive.  Because  aggressive  prostate  cancers  are  often  treated  non-surgically  and  because  AAs 
frequently  select  radiation  instead  of  surgery,  there  are  disproportionately  fewer  radical  prostatectomies  available  to  study 
the  molecular  features  of  PrCas  in  AAs.  Thus,  much  less  is  known  concerning  the  biology  of  PrCa  in  AAs  and  this  lack  of 
knowledge  can  limit  therapeutic  options  for  AAs  with  PrCa,  especially  the  choice  of  active  surveillance  (AS).  To  reduce 
this  racial  disparity  in  PrCa  research  in  AAs,  this  project  focuses  on  the  molecular  analysis  of  prostate  biopsies  in  order  to 
capture  a  more  representative  study  population.  We  utilize  an  innovative  tissue  print  technology  in  which  nitrocellulose 
blots  (tissue  prints)  are  collected  from  each  prostate  biopsy  core  and  used  as  a  source  of  RNA,  DNA  and  proteins  for 
biomarker  studies.  By  focusing  on  biopsies,  we  are  able  to  identify  molecular  features  of  a  wide  range  of  PrCas  including 
cancers  from  AA  patients  and  from  EA  patients  who  select  radiation  therapy,  have  high  Gleason  scores  and/or  high  stage 
of  PrCas  that  cannot  be  successfully  treated  by  radical  surgery. 

There  are  several  hypotheses  as  to  why  PrCas  are  more  aggressive  in  AAs  including  those  considering  social,  cultural  and 
economic  issues  that  may  delay  evaluation  of  prostate  health  and  appropriate  screening  and  therapy  for  prostate  diseases. 
Nevertheless,  most  studies  have  identified  that  biological  issues  also  are  likely  involved  in  the  aggressiveness  of  PrCas  in 
AAs.  One  biological  based  hypothesis  is  that  there  are  unidentified  molecular  characteristics  that  affect  the  biology  of 
PrCas  in  AAs.  These  may  be  inherited  genetic  factors,  DNA  mutations  in  the  tumor  or  epigenetic  changes  secondary  to  or 
interacting  with  other  biological  changes  caused  by,  for  example  environmental  exposures,  diet,  and/or  obesity.  To  help 
differentiate  inherited  and  environmental  factors  that  may  lead  to  more  aggressive  PrCa  in  AA,  our  molecular  analyses 
include  ancestry  genotyping  to  identify  West  African,  European  (EU),  and  Native  American  (NA)  ancestry  based  on 
single  nucleotide  polymorphisms  (SNPs)  that  are  used  as  ancestry  informative  markers  (AIMs). 

Our  work  during  the  last  year  revealed  that  self-identified  AAs  may  fall  into  two  subgroups  that  differ  with  respect  to 
PrCa  aggressiveness.  Specifically,  in  a  series  of  83  self-identified  AAs  we  observed  that  almost  all  (95%)  of  the  AA 
individuals  diagnosed  with  high  grade  PrCa  (Gleason  7  or  more)  on  biopsy  had  more  than  75%  West  African  (WA) 
ancestry  by  AIMs  genotyping,  while  a  large  proportion  (40%)  of  the  AAs  diagnosed  with  no  cancer  on  biopsy  showed 
more  than  25%  EU  genetic  admixture.  This  finding  suggests  that  in  AAs,  ancestry  genotyping  may  be  helpful  in  assessing 
individual  PrCa  risk  and  provide  useful  information  for  AAs  who  are  considering  AS  rather  than  immediate  treatment.  It 
also  points  to  a  need  to  adopt  an  “ancestry  informed”  approach  to  characterizing  PrCa  in  AA  populations. 

Analysis  of  high  grade  cancers  using  prostate  biopsy  tissue  prints  has  revealed  prostate  cancer  subtypes  that  were  either 
unrecognized  or  significantly  underestimated  in  previous  studies.  Our  gene  expression  data  has  identified  the  involvement 
of  3  molecules  involved  with  the  transport  and/or  synthesis  of  lipids  that  are  highly  overexpressed  in  a  sub-set  of  PrCas. 
These  include  fatty  acid  binding  protein  5  (FABP5)  and  fatty  acid  binding  protein  1  (FABP1),  genes  that  have  not  been 
previously  reported  to  be  differentially  expressed  in  PrCas.  Interestingly,  higher  levels  of  FABP5  expression  seem  to  be 
more  common  in  PrCas  from  AAs.  A  third  lipid  pathway  gene  overexpressed  in  sub-sets  of  PrCas  is  fatty  acid  synthase 
(FASN),  previously  shown  to  be  involved  in  the  emergence  of  castrate  resistant  PrCa.  FABP5,  FABP1  and  FASN 
overexpression  represent  actionable  alterations  in  genes  that  control  PrCa  lipid  processing  and  metabolism  that  may  reveal 
links  between  diet,  obesity  and  aggressive  forms  prostate  cancer. 

During  this  reporting  period,  we  obtained  data  by  mass  spectrometry  as  to  proteins  that  macrodissected  prostate  glands 
from  both  AA  and  EA  patients.  These  tissues  also  were  analyzed  for  protein  differences  in  PrCas  between  AAs  and  EAs. 
Comparing  PrCas  with  uninvolved  prostate  glands,  53  proteins  were  identified  to  be  statistically  increased  and  32  proteins 
were  statically  decreased.  Comparing  AAs  with  EAs,  10  proteins  in  AAs  were  statically  increased  and  21  proteins  were 
statistically  decreased. 
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In  addition,  in  this  reporting  period,  we  have  expanded  observations  of  epigenetic  effects  of  prostate  cancer  foci  on 
surrounding  uninvolved  prostate  glands,  including  promoter  hypermethylatiion  of  genes  such  as  (glutathione  S-transferase 
1)  GSTP1,  adenomatous  polyposis  coli  (APC)  and  Ras  association  domain  family  member  1  (RASSF1).  These  are 
translatable  as  improved  “field  effect”  tests  that  can  be  used  to  detect  occult  high  grade  cancer  in  patients  who  are 
considering  active  surveillance. 

This  reporting  period  also  covers  a  significant  expansion  of  our  collaboration  with  the  new  MRI/US  Fusion  Guided 
Prostate  Biopsy  service  at  UAB.  Because  we  can  now  overlay  MR  imaging  information  with  pathology  and  with  tissue- 
print  molecular  marker  mapping,  we  are  especially  well  positioned  to  translate  what  we  are  learning  about  PrCa  in  AAs  to 
better  guide  decisions  about  active  surveillance. 

BODY 

Administrative: 

The  major  administrative  problem  involved  the  DOD  action  on  a  request  from  Drs.  Gaston  and  Grizzle’s  laboratories  for  a 
no  cost  extension  of  their  respective  grants  beyond  May  31,  2015  (UAB)  and  June  2015  (Tufts).  The  no  cost  extension  of 
Dr.  Gaston  was  approved  on  8/19/15.  The  no  cost  extension  for  Dr.  Grizzle  was  approved  on  9/15/15.  Part  of  the  problem 
with  the  delays  was  that  UAB’s  Grants  and  Contracts  Office  had  used  the  wrong  grant  number  (actually  Dr.  Gaston’s 
grant  number)  in  yearly  financial  reports  for  Dr.  Grizzle’s  grant.  DOD  did  not  report  this  to  us  until  late  in  the 
administrative  process.  This  period  of  administrative  and  financial  uncertainty  caused  some  shift  in  specific  scientific 
approaches  of  the  grant. 

In  this  reporting  period,  other  administrative  issues  included  the  renewal  of  IRB  at  UAB  and  the  change  in  status  of  the 
UCA  IRB  from  closed  to  active  in  order  to  collect  additional  information  from  patients  who  had  been  accrued  at  UCA. 
Because  the  collection  of  the  contracted  60  cases  of  PrCa  had  been  successfully  completed  at  UCA  in  2014,  the  Western 
IRB  was  asked  to  classify  the  IRB  for  UCA  as  being  in  a  “data  analysis”  only  category  because  no  further  cases  were 
being  accrued.  In  2014,  UCA  without  UAB’s  knowledge  closed  the  UCA  Western  IRB  and  the  UCA  IRB  at  the  DOD. 
Both  IRB’s  had  to  be  reopened  for  data  analysis  because  additional  data  collection  was  needed  for  publication.  This  was 
completed  and  the  additional  data  were  collected  and  transferred  to  UAB.  The  IRB  at  UCA  now  remains  in  the  “data 
analysis”  category. 

The  Tufts  Medical  Center  IRB  has  classified  Dr.  Gaston’s  component  of  this  project  as  exempt. 

During  this  period,  UAB  trained  4  student  assistants,  Ms.  Fowler,  Ms.  Fuller,  Ms.  Perez  Aponte  and  Ms.  Sun.  Others 
involved  in  the  project  did  not  change.  During  this  period,  Ms.  Lian  Tian  was  replaced  as  the  technician  in  Dr.  Gaston’s 
laboratory  at  Tufts  Medical  Center  with  Mr.  James  Kearns.  In  addition,  an  undergraduate  research  intern,  Mr.  Ravi 
Chinsky,  assisted  with  the  project. 

In  2015  Dr.  Grizzle  continued  to  be  a  consultant  to  a  continuing  grant  DOD  Prostate  Cancer  Tissue  Repository.  In 
addition,  in  2015,  he  was  added  to  the  External  Advisory  Committee  of  this  grant. 

In  2015,  Dr.  Gaston  DOD  was  chairperson  for  the  March  and  July  Special  Emphasis  Panels  reviewing  the  NCI  Innovative 
Molecular  Analysis  Technology  SBIR  grants  (ZRG1  OTC-H  (10),  was  an  ad-hoc  member  of  the  NCI  Chemo/Dietary 
Prevention  Study  Section  in  February  (CDP)  and  a  member  of  the  NCI  Cancer  Detection,  Diagnosis  and  Treatment 
Technologies  for  Global  Health  review  panel  in  July  (ZCA1  TCRB-6  (Al). 
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Specific  Scientific  Progress  and  Results: 

Collection  of  Nitrocellulose  Blots  (Tissue  Prints)  for  Analysis:  In  this  reporting  period,  collection  of  tissue  prints 
shifted  completely  to  UAB  with  a  focus  on  obtaining  tissue  prints  from  biopsy  cores  obtained  using  magnetic  resonance 
imaging  (MRI)  fused  with  ultrasound  (MRI-US)  to  guide  the  biopsy  procedure.  While  biopsies  obtained  using  the 
standard  US  approach  are  collected  blindly  as  to  areas  of  the  prostate  suspicious  for  cancer,  MRI-US  adds  enough 
information  from  the  MRI  component  to  identify  areas  suspicious  for  cancer  to  which  biopsies  can  be  directed.  In  2015, 
UAB  accrued  15  patients  (10  AAs  and  5  EAs)  from  who  biopsies  were  obtained  using  standard  US  technology  and  53 
patients  (10  AA  and  43  EA)  from  whom  biopsies  were  obtained  using  MRI-US  technology.  In  some  cases,  some  biopsies 
are  obtained  only  by  MRI-US  from  areas  suspicious  for  cancer  while  other  biopsies  are  obtained  from  both  areas 
suspicious  for  cancer  as  well  as  standard  “blind”  US  biopsies.  Tissue  prints  were  obtained  from  all  biopsy  cores;  overall  a 
total  of  835  tissues  prints,  237  from  AAs  and  598  from  EAs  were  obtained  from  all  biopsy  cores.  This  does  not  count 
tissue  prints  from  one  case  which  as  discarded  because  of  infection  of  the  patient  with  Hepatitis  C.  Also,  13  tissue  prints 
were  obtained  from  two  radical  prostatectomies  from  patients  with  prior  biopsies.  These  results  are  included  in  our 
summary  of  cases  (Tables  “EN”).  A  summary  of  our  cumulative  enrollment  and  biopsy  tissue  print  collection  is  included 
in  EN  Tables  1-6. 

Differential  Expression  of  Prostate  Biomarkers  Associated  with  Lipid  Transport,  Syntheses,  and  metabolism. 

Previously,  Dr.  Gaston  used  gene  analysis  of  mRNA  from  blots  of  prostate  cancer  to  identify  the  importance  of  fatty  acid 
protein  5  (FABP5)  in  prostate  cancer.  This  observation,  confirmed  by  qrtPCR  and  by  review  of  data  published  using  gene 
sequencing  studies  identified  a  potential  subset  of  PrCas  which  have  elevated  mRNAs  for  FABP5.  Because  of  our  focus 
on  lipid  transport,  we  elected  to  study  two  other  related  molecules  associated  with  lipid  control  in  PrCa  including  fatty 
acid  binding  protein  1  (FABP1)  and  fatty  acid  synthase  (FASN).  These  molecules  also  were  identified  by  qrtPCR  to  be 
elevated  in  PrCa.  To  evaluate  the  phenotypic  differential  distribution  of  FABP5,  FABP1  and  FASN  in  PrCa  in  AAs  and 
EAs,  UAB  identified,  collected,  reviewed  and  selected  paraffin  blocks  of  normal  prostate  from  radical  cystectomy 
specimens  from  AAs  and  EAs  and  from  AA  and  EA  cases  with  radical  prostectomies  containing  PrCa.  Sections  from 
immunostained  cases  were  sent  to  Dr.  Gaston  who  works  with  Dr.  Kittles  to  identify  racial  admixtures.  Also,  for  some 
immunostained  cases,  sections  are  sent  to  Dr.  Gaston  for  qrtPCR  analysis  of  biomarkers  of  interest. 

Our  studies  in  this  reporting  period  have  been  focused  on  determining  the  differential  expression  of  FABP  5,  FABP  1  and 
FASN  in  patients  evaluated  for  prostate  cancers  using  radical  prostectomies;  also,  we  have  evaluated  these  molecules  in 
patients  undergoing  a  radical  cystectomy  but  found  to  have  no  prostate  cancer  in  the  associated  removed  prostates 
designated  as  (normal  prostate) 

We  found  that  FABP5  is  strongly  expressed  in  prostate  cancers  but  there  is  low  to  no  phenotypic  expression  in  normal 
prostate  tissue  or  normal  appearing  (uninvolved)  prostate  glands  from  patients  with  prostate  cancers.  Higher  values  of 
FABP-5  were  expressed  in  PrCas  in  AAs  compared  to  EAs.  There  was  expression  of  FABP5  in  low  and  high  grade 
prostate  intraepithelial  neoplasia  (PIN).  The  intracellular  pattern  of  expression  in  PrCa  was  primarily  cytoplasmic  with 
accentuation  of  staining  in  the  areas  of  the  cell  membrane  and  areas  of  the  nuclear  membrane.  There  also  was  expression 
in  the  nuclei  of  PrCa  cells.  (Figure  IHC-1,  Figure  IHC-2,  Figure  IHC-3  in  drop  box.) 

In  contrast  to  FABP-5,  FABP1  had  only  slight  to  no  differential  expression  in  prostate  cancers  compared  to  normal 
prostate  glands  (no  cancer)  or  to  uninvolved  prostate  glands  from  the  same  matching  cases  of  prostate  cancer.  Because  of 
the  large  number  of  cases,  differences  between  uninvolved  prostate  glands  and  PrCa  is  still  significantly  different  at  the 
cytoplasmic  and  membrane  areas  of  the  malignant  cells.  Of  note,  this  low  differential  expression  does  not  exclude  FABP1 
from  being  important  as  a  potential  target  for  therapy.  The  intracellular  expression  of  FABP  1  in  PrCa  has  strong 
expression  in  the  cytoplasm  with  somewhat  stronger  expression  in  the  area  of  the  cell  membrane  and  the  perinuclear  area. 
There  is  weaker  expression  in  the  nuclei  of  tumor  cells  compared  to  cytoplasmic  and  membrane  staining.  (Figure  IHC-4, 
Figure  IHC-5).  Figure  IHC-6  in  drop  box). 
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Fatty  acid  synthase  (FASN)  has  clear  differential  expression  in  prostate  cancer  when  compared  with  the  minimal 
expression  in  normal  prostate  glands  (from  non-cancer  cases)  and  uninvolved  (normal  appearing)  prostate  glands  from 
matching  cases  with  cancer.  In  contrast  to  FABP5,  the  higher  volume  of  FASN  tends  to  occur  in  EAs.  The  intracellular 
expression  of  FASN  is  somewhat  variable  even  for  malignant  cells  within  the  same  gland.  In  PrCa,  there  is  prominent 
cytoplasmic  and  cellular  membrane  staining  with  accentuation  in  the  perinuclear  area.  Of  note,  compared  to  FABP5,  there 
is  frequently  no  nuclear  staining,  in  most  cells  of  PrCa.  However,  in  high  grade  tumors  there  seems  to  be  an  increase  in 
FASN  expression  in  nuclei.  This  change  to  an  intracellular  nuclear  pattern  may  be  an  important  regulatory  pathway. 
(Figure  IHC-7,  Figure  IHC-8),  Figure  IHC-9  in  drop  box). 

Because  FABP5  and  FASN  have  been  observed  in  some  cases  of  PrCa  to  be  inversely  expressed  at  the  mRNA  level,  we 
evaluated  this  at  the  protein  level.  This  correlation  is  demonstrated  for  FABP5  versus  FASN  (Figure  ICH-10),  FABP1  vs 
FASN  (Figure  ICH-1 1)  and  FABP1  vs  FABP5  (Figure  ICH-12).  The  result  for  FABP5  vs  FASN  did  not  demonstrate  the 
pattern  observed  at  the  mRNA  level;  however,  the  pattern  did  emphasize  that  there  is  increased  expression  in  a  subset  of 
African  Americans  with  a  higher  expression  of  FABP5  and  a  similar  increase  in  expression  for  EAs  of  FASN  (Figure 
ICH-10).  This  is  demonstrated  in  TABLE  IHC-1.  Also,  of  interest,  there  is  increased  nuclear  expression  in  subgroups  of 
FABP5  and  FASN  which  is  indicative  of  a  shift  of  FABP5  and  FASN  into  the  nuclei  of  some  tumor  cells.  Because  of  the 
large  ranges  in  expression  of  FABP5  and  FASN,  overall  there  is  not  a  statistically  significant  difference  in  the  overall 
pattern  so  the  analysis  is  based  upon  cutoffs  of  phenotypic  expression  which  vary  with  the  expression  in  each  of  the 
intracellular  areas  (e.g.  nuclear  expression). 

In  summary,  FABP5  is  differentially  expressed  in  patients  with  PrCa.  Of  these,  patients  with  higher  levels  of  PrCa,  there 
is  a  predilection  for  these  patients  to  be  self-identified  AAs.  These  results  also  were  consistent  with  results  of  MS.  In 
contrast,  patients  with  higher  levels  of  FASN  tend  to  be  self-identified  EAs. 

FABP5  and  fatty  acid  binding  protein  4  (FABP4)  are  increased  in  metabolic  syndrome,  which  is  a  disorder  which  includes 
central  obesity  and  elevated  glucose  and  a  tendency  to  develop  cardiovascular  disease  and  adult  onset  diabetes.  The 
importance  of  metabolic  syndrome  has  led  to  a  widely  used  ELISA  for  FABP5  designed  for  serum.  We  have  tested  the 
FABP5  ELISA  assay  for  FABP5  and  found  that  it  is  technically  reproducible  and  easy  to  perform;  however,  because  of 
concern  for  the  stability  of  multiple  molecules  in  older  samples  of  serum,  (Potter  et  al  2012),  we  elected  to  postpone 
further  evaluation  of  FABP5  in  bodily  fluids  until  a  new  set  of  fresh  samples  of  serum  are  obtained.  We  have  ordered 
these  specimens  from  the  Cooperative  Human  Tissue  Network  (CHTN)  to  facilitate  our  studies. 

Discovery  of  Proteins  in  PrCa  Using  Mass  Spectrometry:  Our  initial  approach  to  identify  molecules  associated  with 
the  aggressiveness  of  PrCa  and  racial  differences  between  these  molecules  used  multiplex  immune  assays  of  serum, 
plasma  and  urine.  A  problem  with  these  studies  was  that  there  were  reports  in  the  literature  that  specific  molecules  in 
bodily  fluids  began  to  change  after  1  to  2  years  (Potter  et  al  2012).  Until  we  could  address  this  problem,  we  shifted  to 
mass  spectrometry  analysis  comparing  biomarkers  in  tissues  of  AA  and  EA  patients,  to  identify  molecules  differentially 
expressed  in  PrCas.  In  this  reporting  period,  using  mass  spectrometry  (MS)  we  completed  the  analysis  of  8  AA  patients 
with  prostate  cancer  and  12  EA  patients  without  prostate  cancer  for  discovery  of  proteins  associated  with  self-identified 
AAs  and  self-identified  EAs.  For  each  category  of  patients,  both  paired  PrCa  and  uninvolved  prostate  glands  were 
macrodissected.  Thus,  this  study  identified  proteins/peptides  which  are  differentially  expressed  in  PrCas  in  addition  to 
proteins/peptides  which  are  selectively  overexpressed  or  underexpressed  in  AAs  versus  EAs. 

The  initial  approach  to  mass  spectrometry  involved  macrodissection  of  paraffin  blocks  of  prostate  cancer  and  matched 
uninvolved  prostate  from  the  same  cases  of  prostate  cancer.  Thus  a  total  of  20  specimens  of  macrodissected  prostate 
cancer  and  20  specimens  of  macrodissection  matching  uninvolved  prostate  were  initially  compared  to  identify  molecules 
differentially  expressed  in  prostate  cancer.  Each  macrodissected  specimen  was  evaluated  in  3  dimensions  (i.e.,  externally 
and  longitudinally  to  ensure  that  no  cancer  was  present  in  the  specimen  of  uninvolved  prostate  glands  and  that  the  PrCas 
were  composed  of  least  60%  malignant  cells).  In  these  specimens  a  total  of  896  proteins  were  identified  and  after  these 
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were  filtered  and  judged  to  be  statistically  relevant,  514  proteins  were  evaluated  of  which  53  were  statistically  increased  in 
abundance  and  32  were  statistically  decreased  in  abundance  in  PrCas  with  a  false  discovery  rate  of  <0.1%  (Figure  MSI). 

Based  on  systems  analysis,  the  major  organs  and  processes  involved  in  the  85  proteins  differentially  expressed  in  PrCa  are 
shown  in  Figures  MS-2  and  MS-3.  The  cytoplasm,  extracellular  proteins  and  nuclei  were  the  most  common  tissue 
localizations  identified  as  using  the  source  of  these  proteins  and  cellular  processes  and  cellular  regulation  were  the  most 
common  themes  in  which  these  proteins  were  involved. 

The  most  significant  20  proteins  identified  by  their  abundance  (increased  or  decreased)  to  be  differentially  expressed  are 
shown  in  Figure  MS-4.  The  main  proteins  of  interest  are  those  which  are  increased  significantly  in  PrCa. 


After,  the  study  of  differentially  expressed  proteins  in  prostate  cancer,  we  next  focused  on  racial  differences  in  prostate 
cancer,  comparing  PrCas  from  AAs  with  PrCas  from  EAs.  We  used  the  same  approach  to  filter  the  896  proteins  to  298 
proteins.  Of  these,  10  proteins  were  found  to  be  statistically  increased  in  abundance  and  21  were  found  to  be  statistically 
decreased  in  abundance  in  AAs  when  compared  to  EAs.  (Figure  MS-5).  The  10  proteins  found  to  be  increased  in 
abundance  are  listed  in  Table  MS-1  and  those  that  are  decreased  are  in  Table  MS  2 

In  view  of  our  interest  in  lipid  controlling  molecules,  we  noted  that  zinc-alpha-2-glycoprotein,  previously  reported  by 
others  as  a  cancer  marker  that  stimulates  lipolysis  is  increased  in  PrCa  in  AAs.  Because  this  molecule  may  be  involved  in 
the  cachexia  resulting  from  cancer,  it  will  be  added  to  our  studies  of  lipids.  Three  other  molecules  of  this  group  of  10  have 
been  associated  with  motility  and  potential  metastases.  These  are  galectin-3 -binding  protein,  alpha-actinin-4,  and  keratin, 
type  II  cytoskeletal  5.  Also  ubiquitin-like  modifier-activating  enzyme  which  has  been  proposed  by  others  as  a  target  for 
cancer  therapy  is  elevated  and  is  likely  to  be  an  important  molecule  in  our  study.  Of  interest  and  as  expected,  PSA  is  also 
increased  in  AAs  compared  to  EAs.  Several  other  molecules  whose  importance  is  unknown  also  are  listed  in  Table  MS-1 
including  SERPINA3  which  will  be  discussed  subsequently. 

The  21  proteins  that  are  decreased  in  tissue  from  PrCas  in  AAs  versus  Eas  are  listed  in  Table  MS-2;  however,  our  major 
focus  will  be  on  proteins  that  are  increased  rather  than  decreased  unless  important  molecules  that  are  decreased  are 
identified  in  systems  analysis.  Figure  MS-8  demonstrates  the  distribution  in  tissue  of  proteins  in  PrCa  that  are 
significantly  changed.  Specifically,  cytoplasmic,  extracellular  and  nuclear  proteins  are  the  most  affected.  Similarly  in 
Figure  MS-9  the  main  molecular  functions  of  the  significantly  changed  proteins  involve  binding,  catalytic  activity  and 
structure  of  the  tissue.  The  biological  processes  of  the  significantly  changed  proteins  (Figure  MS- 10)  involve  cellular 
process  and  regulation.  An  example  of  changes  in  a  specific  protein  (Fibrillin-1)  is  demonstrated  in  Figure  MS-1 1  and  a 
system  analysis  of  SERPINA3  (ACT)  is  demonstrated  in  MS-12.  Of  note,  SERPINA3  is  a  very  important  molecule  in 
cellular  responses  to  stress,  control  of  immune  responses  and  responses  to  cellular  stimuli. 

Evaluation  by  Mass  Spectrometry  of  FABP-1,  FABP-5 and  FASN  Based  on  Race:  In  the  last  quarterly  report,  we  noted 
that  FABP-1  was  not  detected  by  mass  spectrometry;  however,  7  cases  had  elevated  FABP-5  in  tumors  with  a  normalized 
relative  intensity  (NRI)  =  3.3  but  no  FABP-5  was  detected  in  uninvolved  prostate.  Also,  15  cases  had  prostate  cancer  in 
which  FASN  was  detected  with  a  NRI  =  4.6,  but  FASN  was  detected  in  only  2  cases  of  uninvolved  prostate  with 
(NRI=2.0). 

For  FABP-5,  3  of  the  7  cases  in  which  FABP-5  was  detected  were  in  AAs  (3/8)  with  an  NRI  =  6.6  and  4  of  the  7  cases 
were  in  EAs  (4/1 1)  with  an  NRI  =1.2  note  one  case  of  the  twelve  EA  patients  was  lost  to  analysis.  These  results  indicate 
that  FABP-5  is  more  strongly  expressed  in  prostate  cancer  from  AAs  than  in  prostate  cancers  from  EAs  and  are  consistent 
with  results  based  on  analysis  at  the  mRNA  level  and  with  immunohistochemistry. 
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When  FASN  results  were  separated  based  on  self-identified  race,  7  of  the  15  cases  of  FASN  were  in  AAs  (7/8)  with  and 
NRI  =  4.4  and  8  cases  were  in  EAs  (8/1 1)  with  a  NRI  =  4.8.  The  two  cases  in  which  FASN  expression  was  detected  in 
uninvolved  tissue  also  were  in  EAs  (2/11)  with  a  NRI  =  2. 

The  21  proteins  that  were  decreased  in  specimens  of  prostate  cancer  from  AA  patients  are  listed  in  Table  2.  We  are  still  in 
the  process  of  evaluating  the  potential  importance  of  the  decreases  in  each  of  these  specific  proteins  on  the  aggressiveness 
of  prostate  cancers;  however  increased  proteins  are  of  major  interest. 

Ancestry  Genotyping 

When  patients  of  different  racial  groups  are  analyzed  and  compared,  the  results  can  be  affected  by  racial  admixtures  in  the 
study  populations  which  initially  are  not  recognized.  Dr.  Rick  Kittles,  our  collaborator  on  this  DOD  project,  specializes  in 
studies  that  incorporate  ancestry  genotyping  into  studies  addressing  health  disparities.  The  Kittles  lab  analyzes  DNA 
samples  extracted  from  de-identified  subjects  based  on  single  nucleotide  polymorphisms  (SNPs);  he  uses  a  profile  of  109 
unlinked  autosomal  SNPS  that  have  been  selected  as  ancestry  informative  markers  (AIMs)  to  differentiate  individuals  of 
European,  West  African  and  Native  American  ancestry.  In  admixed  populations,  the  AIMs  results  can  then  be  used  to 
estimate  the  relative  proportion  of  these  3  racial  groups  in  each  individual’s  ancestry.  During  this  reporting  period,  the 
Kittles  lab  completed  ancestry  genotyping  for  126  of  our  study  subjects,  114  from  our  prostate  biopsy  tissue  print  series 
and  12  from  our  radical  prostatectomy  (FFPE  samples)  series.  The  results  of  this  analysis  were  provided  to  Dr.  Gaston 
who  has  correlated  the  racial  admixture  results  with  the  biopsy  and  tissue  print  results. 

The  overall  pattern  of  racial  admixture  in  our  Birmingham  area  study  subjects  is  similar  to  what  has  been  observed  in 
other  US  populations.  As  expected,  many  self-identified  African  Americans  showed  genetic  evidence  of  ancestry 
admixture;  in  our  Birmingham  AA  subjects  this  admixture  was  almost  entirely  from  European  ancestors  with  only  rare 
individuals  showing  appreciable  Native  American  ancestry. 

The  AIMs  results  from  our  prospectively  enrolled  prostate  biopsy  study  subjects  were  particularly  interesting.  In  our 
prospective  study,  ancestry  genotyping  was  performed  on  DNA  prints  collected  prior  to  biopsy  and  AIMs  genotyping  was 
performed  by  the  Kittles  lab  blinded  to  both  self-identified  ancestry  and  biopsy  results.  Comparison  of  AIMs  genotypes 
and  biopsy  pathology  findings  for  the  83  self-identified  AA  subjects  in  the  prospective  biopsy  study  showed  that,  as  a 
group,  the  men  who  were  diagnosed  with  high  grade  PrCa  (Gleason  sum  7  or  more)  were  more  likely  to  have  genotype 
estimates  of  more  than  0.75  West  African  Ancestry,  as  compared  to  the  men  who  were  diagnosed  with  no  cancer  (P  = 
0.001)  (Figure  1,  Figure  2,  Table  1).  A  similar  trend  is  observed  in  a  comparison  of  West  African  Ancestry  between  the 
men  diagnosed  with  high  grade  cancer  (Gleason  sum  7  or  more)  vs  low  grade  cancer  (Gleason  sum  6).  To  our  knowledge, 
no  previous  studies  have  evaluated  the  levels  of  West  African  ancestry  within  a  self-identified  African  American 
population  as  a  marker  for  relative  risk  of  prostate  cancer.  It  should  be  noted  that  our  studies  use  an  ancestry  genotyping 
panel  with  a  relatively  small  number  of  well-established  ancestry  informative  markers,  and  that  this  type  of  molecular 
testing  is  relatively  inexpensive.  Thus  confirmation  of  a  significantly  increased  risk  of  prostate  cancer  (and  potentially,  of 
high  grade  prostate  cancer)  in  African  American  men  with  relatively  high  levels  of  West  African  genetic  ancestry  could 
have  immediate  potential  clinical  applications  for  prostate  cancer  screening  and  active  surveillance.  These  findings  were 
first  presented  during  this  reporting  period  at  an  invited  Minorities  in  Cancer  Research  Scientific  Symposium  entitled 
“Emerging  Methodology  and  Tools  for  Understanding  the  Genetics  of  Cancer  Disparities”  at  the  2015  Annual  Meeting  of 
the  American  Association  for  Cancer  Research. 

Activities  planned  for  next  quarter:  We  have  submitted  DNA  from  an  additional  61  study  subjects  to  the  Kittles  laboratory 
for  AIMs  ancestry  analysis.  The  Kittles  lab  has  indicated  that  the  analysis  of  these  samples  should  be  completed  in  time 
for  us  to  present  an  update  of  our  findings  at  the  November  2015  AACR  conference  on  AACR  Conference  on  the  Science 
of  Cancer  Health  Disparities. 

Mrna  Gene  Expression  Analysis 
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Affymetrix  Whole  Transcrip  tome  mRNA  Gene  Expression  Profiling  and  qrtPCR  Confirmatory  Analysis:  In  this 
study  we  are  using  the  Affymetrix  Human  Whole  Transcriptome  2.0  (HTA  2.0)  array  to  identify  genes  involved  in 
prostate  cancer  and  then  confirm  genes  of  interest  using  quatitative  rtPCR  (qrtPCR)  technology.  The  HTA  2.0  is  currently 
the  most  comprehensive  array  for  interrogating  human  transcript  isoforms  for  expression  profiling.  In  addition  to  gene- 
level  detection,  this  array  provides  the  necessary  coverage  and  accuracy  required  to  detect  all  know  human  transcript 
isoforms  produced  from  a  gene.  The  HTA  2.0  design  utilizes  multiple  data  sources  to  design  and  annotate  the  array  are 
RefSeq,  Ensembl,  UCSC  Known  Genes,  UCSC  LincRNA  transcripts  and  Broad  Institute  -  Human  Body  Map  lincRNAs 
and  TUCP  (transcripts  of  uncertain  coding  potential)  catalog.  As  with  most  gene  profiling  techniques,  the  HTA  2.0  array 
performs  best  with  high  quality  RNA  and  because  we  routinely  prepare  RNA  from  biopsy  tissue  prints  with  RINs  better 
than  7  (total  RNA  per  prostate  biopsy  print  approximately  200  ng)  we  have  been  able  to  take  full  advantage  of  this 
technology. 

In  addition  to  conventional  analysis  of  our  prostate  biopsy  gene  expression  data,  we  have  found  an  approach  described  by 
Gorlov  et  al  (2014)  to  be  highly  productive.  These  authors  observed  that  while  a  typical  approach  to  analyzing  tumor  gene 
expression  compares  cancer  to  adjacent  uninvolved  tissue,  an  analysis  of  inter-individual  tumor-to-tumor  variation  in  gene 
expression  can  be  a  more  efficient  way  to  identify  genes  that  are  over  or  under  expressed  in  a  molecular  subgroup.  One 
important  advantage  to  this  type  of  tumor-to-tumor  analysis  is  that  it  is  not  confounded  by  cancer-associated  changes  in 
adjacent  normal-looking  tissue  (cancer  “field  effects”).  Our  pairwise  analysis  of  tumor-to-tumor  variation  in  biopsies  from 
AA  and  EA  patients  with  high  grade  prostate  cancer  has  identified  several  robustly  overexpressed  “outlier”  genes  of 
interest.  These  include  genes  involved  in  fatty  acid  processing  and  metabolism.  Most  notably,  we  identified  a  set  of  AA 
PrCa  with  extremely  high  (over  10  fold)  overexpression  of  fatty  acid  binding  protein  5  (FABP5).  Although  FABP5  has 
not  been  a  major  focus  of  PrCa  research,  qrtPCR  studies  confirmed  this  “super  over-expression”  pattern  in  a  PrCa 
subgroup  and  identified  a  second  PrCa  subgroup  with  high  overexpression  of  fatty  acid  synthase  (FASN).  Since  our  last 
annual  report,  we  have  focused  much  of  our  effort  in  further  characterizing  these  two  previously  unrecognized  PrCa 
subtypes. 

At  the  level  of  the  biopsy  core,  approximately  15-10%  of  the  high  grade  prostate  cancers  show  outlier  “super 
overexpression”  of  FABP5  mRNA  at  more  than  10  fold  over  the  baseline  expression  observed  in  benign  prostate  biopsies 
in  patients  diagnosed  with  no  cancer.  A  similar  15-20%  of  high  grade  prostate  biopsy  cores  show  outlier  “super 
overexpression  of  FASN  mRNA  at  more  than  10  fold  over  baseline  expression  in  benign  cores  from  cases  with  no  cancer 
(GE  Figure  1).  Outlier  super-overexpression  of  FABP1  is  also  observed,  but  is  less  prevalent  in  our  high  grade  biopsy 
cores  (about  8-10%).  Comparison  of  same-core  mRNA  expression  patterns  shows  that  while  some  tumors  show  moderate 
overexpression  of  more  than  one  of  these  three  markers,  super  overexpression  is  observed  in  an  “either-or”  pattern. 

Tumors  with  top  quartile  levels  of  FASN  do  not  show  super  overexpression  of  either  of  the  binding  proteins,  and  vice 
versa  (GE  Figure  2).  This  observation  suggests  that  there  are  two  different  ways  for  prostate  cancers  to  satisfy  their 
increased  demands  for  fatty  acids,  either  by  de-novo  synthesis  (FASN)  or  by  increased  uptake  from  the  extracellular 
environment  (FABPs).  When  we  look  at  expression  patterns  at  the  level  of  the  study  subject  and  consider  the  highest 
expression  in  any  core  we  see  that  the  pattern  consistent  with  that  seen  by  IHC  in  an  independent  set  of  samples,  with  AA 
predominant  in  the  FABP5  super  overexpressors  and  EA  predominant  in  the  FASN  super  over  expressors.  Interesting 
some  multi-focal  cancers  show  both  FABP5  and  FASN  overexpressing  clones,  perhaps  showing  synergy  as  one  focus 
synthesizes  new  fatty  acids  (FASN  overexpression)  and  the  adjacent  focus  takes  advantage  the  excess  (FABP 
overexpression).  If  these  observations  are  confirmed  as  we  move  forward,  we  may  be  able  to  identify  patients  whose 
prostate  cancer  can  be  selectively  targeted  by  pharmacological  or  dietary  interventions  that  target  these  two  lipid 
processing/synthesis  pathways. 

Analysis  Of  Epigenetic  Cancer  Associated  Changes  In  Dna  Methylation  Patterns: 

Scientific  Progress  and  Results:  DNA  extracted  from  prostate  biopsy  tissue  print  nitrocellulose  blots  has  proven  to  be  very 
well  suited  for  studies  of  genes  in  uninvolved  normal  appearing  prostate  glands  postulated  to  be  hypermethylated  by  field 
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effects  secondary  to  adjacent  prostate  cancers.  These  studies  have  shown  that  the  level  of  promoter  hypermethylation  of 
the  genes  GSTP1,  APC  and  RASSF1  in  normal  appearing  tissue  (field  effect  hypermethylation)  is  more  intense  when  the 
prostate  contains  high  grade  cancer,  compared  to  that  found  in  a  prostate  with  only  low  grade  (Gleason  3+3)  cancer.  This 
finding  is  potentially  important,  because  it  shows  that  a  relatively  straightforward  modification  of  a  currently  available 
clinical  test  may  be  useful  in  identifying  patients  who  are  considering  active  surveillance  based  on  a  biopsy  that  failed  to 
detect  a  high  grade  prostate  cancer  due  to  sampling  error.  During  this  last  reporting  period,  we  identified  cut-off  values 
that  optimize  this  prototype  biomarker  test  for  predicting  a  low  risk  of  occult  high  grade  cancer.  This  work  was  presented 
at  the  annual  meeting  of  the  American  Urological  Association  in  2015.  This  biostatistical  analysis  completes  the 
manuscript  that  is  currently  being  prepared  for  submission  to  PLOS  ONE. 

Additional  data  analysis  comparing  our  field  effect  results  with  studies  performed  at  Johns  Hopkins  Hospital  in  Boston  is 
currently  in  progress;  preliminary  results  have  been  submitted  as  an  abstract  to  be  presented  at  the  2016  GU  ASCO 
conference.  In  addition,  with  support  from  the  UAB  Cancer  Center,  we  will  undertake  a  pilot  study  of  the  DNA 
hypermethylation  patterns  in  areas  of  the  prostate  that  are  “suspicious”  for  prostate  cancer.  As  noted  in  the  section  on  MRI 
guided  prostate  biopsy  studies,  comparisons  of  AA  and  EA  patients  show  a  trend  in  which  AA  patients  are  more  likely  to 
have  MRI  suspicious  regions  that  are  negative  for  cancer  in  subsequent  biopsy.  We  will  test  the  hypothesis  that  a 
molecular  test  for  cancer  field  effects  may  help  differentiate  MRI  suspicious  regions  that  are  contain  a  an  occult  prostate 
cancer  that  was  missed  due  to  biopsy  sampling  error  from  MRI  suspicious  regions  that  are  truly  false-positive. 


FABP5  and  FASN  are  two  lead  molecular  markers  for  our  future  studies  because  they  may  identify  PrCa  subtypes  that  are 
differentially  prevalent  in  AA  and  EA,  potentially  significant  as  alternative  fatty  acid  phenotypes  that  can  be  targeted 
therapeutically  and  potentially  visible  by  Multiparametric-MRI  (MP-MRI),  as  a  result  of  changes  in  tissue  composition. 
Additional  markers  that  are  under  evaluation  based  on  Affymetrix  gene  discovery  data  include  fatty  acid  binding  protein  1 
(FABP1),  elongation  of  very  long-chain  fatty  acid  2  (ELOVL2),  neuropeptide  Y  (NPY)  and  VEGF  A.  Additional  markers 
under  evaluation  based  on  Mass  Spec  gene  discovery  including  zinc-alpha-2 -glycoprotein  a  controller  of  lipolysis  and 
hypothesized  to  be  involved  in  cancer  cachexia  and  galectin-3 -binding  proteins  reported  in  other  cancers  to  affect 
aggressiveness. 

FUTURE  DIRECTIONS 

•  We  will  focus  on  increasing  the  number  of  patients  for  whom  we  will  obtain  tissue  prints  of  biopsies  of  the  prostate. 
Most  of  the  emphasis  will  be  on  patients  who  are  undergoing  MRI-US  guided  biopsies.  These  patients  also  will  have 
standard  US  guided  biopsies  and  tissue  prints  will  be  obtained  on  all  biopsy  cores.  Of  special  importance  will  be  tissue 
prints  from  AA  patients.  Our  goal  is  to  have  a  manuscript  submitted  on  this  research  in  November,  2015  (Drs.  Grizzle  and 
Gaston). 

•  We  will  analyze  ancestry  informative  markers  (AIMs)  from  tissue  prints  to  characterize  racial  admixtures  (Dr.  Gaston 
and  Dr.  Kittles)  and  will  analyze  mRNAs  from  tissue  prints  for  genes  of  focus-FABP5,  FABP1,  FASN  and  zinc-alpha-2  - 
glycoprotein. 

•  We  will  analyze  racial  admixtures  from  paraffin  sections  of  cases  analyzed  by  immunohistochemistry  (Drs.  Gaston  and 
Kittles). 

•  We  will  analyze  mRNA  gene  expression  patterns  from  paraffin  sections  of  radical  prostatectomy  cases  analyzed  by 
immunohistochemistry  and  MS  in  order  to  more  completely  characterize  mRNA-protein-histology  correlations  for  the 
genes  involved  in  lipid  processing  and  metabolism,  with  a  particular  focus  on  the  high-expression  prostate  cancer 
subtypes  that  we  have  identified  in  our  analyses  of  the  prostate  biopsy  tissue  prints  (Drs.  Gaston  and  Grizzle). 
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•  For  our  immunohistochemical  study  of  FABP5,  FABP1  and  FASN,  we  will  analyze  additional  cases  with  Gleason 
scores  of  6,  4+3,  and  8-10.  These  cases  will  be  selected  so  that  each  GS  group  will  have  balanced  racial  representations. 
We  also  will  increase  our  analysis  of  normal  prostate  specimens  from  radical  cystectomies  (Dr.  Grizzle). 

•  If  resources  permit,  we  will  establish  ELISA  and  multiplex  immunoassays  using  samples  of  serum  which  are  less  than  1 
year  old.  The  ELISA  will  focus  on  FABP5.  The  multiplex  immunoassay  will  focus  on  our  prior  studies  of  molecules  that 
are  increased  in  multiplex  assays  and  new  molecules  identified  by  MS  (Dr.  Grizzle). 

•  Our  goal  is  to  add  immunohistochemistry  of  zinc-alpha-2 -glycoprotein  and  galectin-3 -binding  protein  to  our  MS  results 
and  submit  a  paper  by  November  2015  (Dr.  Grizzle). 

•  We  currently  are  preparing  a  manuscript  on  this  work  in  which  we  will  perform  immuno  histochemistry  on  zinc-alpha-2  - 
glycognotein  and  galectin-3 -binding  protein  to  demonstrate  variations  with  race. 

KEY  RESEARCH  ACCOMPLISHMENTS 

•  We  have  found  that  self-identified  AA’s  who  are  diagnosed  with  PrCas  with  Gleason  scores  of  >  7  on  prostate  biopsy 
have  a  higher  proportion  (95%)  of  individuals  with  Western  African  ancestry  >  75%  than  do  AAs  with  Gleason  scores  of 
6  or  with  no  PrCa  on  biopsy. 

•  We  have  identified  that  FABP5,  and  FASN  are  molecules  that  are  strongly  expressed  in  PrCas  (p<0.0001  for  both).  Of 
these  cases,  FABP5  is  selectively  expressed  in  AAs  and  FASN  is  selectively  expressed  in  EAs. 

•  We  have  identified  the  FABP1  is  slightly  overexpressed  in  most  PrCas  at  the  protein  level,  but  at  the  mRNA  levels  is 
highly  overexpressed  statistically  in  a  significant  subset  of  patients  with  PrCa. 

•  By  mass  spectrometry  we  have  identified  53  molecules  that  are  overexpressed  and  31  molecules  that  are  under¬ 
expressed  in  PrCas.  We  have  identified  10  molecules  that  are  overexpressed  in  AAs  compared  to  EAs  and  21  molecules 
that  are  under-expressed.  One  of  the  10  molecules  over  expressed  in  PrCas  of  AAs  is  zinc-alpha-2-glycoprotein  involved 
in  lipolysis  and  hypothesized  to  cause  cachexia  of  cancer.  Another  molecule  of  interest  in  PrCas  of  AAs  is  galectin-3 - 
binding  protein  that  has  been  associated  with  aggressiveness  in  other  cancers. 

REPORTABLE  OUTCOMES 

1 .  The  abstract  “Limitations  of  the  use  of  human  prostate  tissues  in  biomedical  research”  was  presented  by  Dr. 
Grizzle  at  the  Prostate  Cancer  Foundation  21st  Annual  Scientific  Retreat,  Carlsbad,  CA,  October  23,  2014. 

2.  The  abstract  “Performance  of  an  epigenetic  assay  to  predict  prostate  cancer  aggressiveness:  Comparing  Gleason 
score  and  NCCN  risk  categories”  was  presented  at  the  EAU  Section  of  Urological  Research  (ESUR)  meeting  in 
October  9-11,  2014,  Glasgow,  Scotland. 

3.  Dr.  Grizzle  was  one  of  5  presenters  (Barnes  M,  Bledsoe  MJ,  Dressier  L,  Grizzle  WE,  Russell-Einhorn  M)  and 
panel  participants  in  the  all-day  pre-meeting  conference  “Contemporary  Issues  in  Biobanking:  Governance, 
Consent  and  Practical  Approaches  to  Current  Challenges.”  This  proceeded  the  conference  “Advancing  Ethical 
Research”  of  the  organization,  Public  Responsibility  in  Medicine  and  Research  (PRIM&R),  Baltimore,  MD, 
December  4,  2014. 

4.  An  abstract  “DNA  hypermethylation  field  effects:  potential  applications  for  detection  of  occult  high  grade 
prostate  cancer”  was  presented  at  the  2015  Genitourinary  Cancers  Symposium  of  ASCO  on  February  26,  2015. 

5.  An  invited  oral  presentation  on  “Tissue  Print  Technologies  for  the  Preparation  of  High  Quality  Human 
Biospecimens”  was  presented  by  Dr.  Gaston  at  the  SELECTBIO  Sample  Preparation  and  Analysis  Technologies 
conference  in  Boston,  Massachusetts,  March  2015. 
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6.  The  abstract  “Epigenetic  Assay  Stratifies  Prostate  Cancer  Patients’  Risk”  was  presented  by  Dr.  Gaston  at  the  2015 
American  Urological  Association  Annual  Meeting  in  May  2015. 

7.  An  invited  podium  (oral)  presentation  “The  use  of  innovative  prostate  biopsy  tissue  print  techniques  for  molecular 
genomic,  epigenetic  and  gene  expression  studies”  was  given  by  Dr.  Gaston  at  the  Minorities  in  Cancer  Research 
Scientific  Symposium  “Emerging  Methodology  and  Tools  for  Understanding  the  Genetics  of  Cancer  Disparities” 
at  the  2015  Annual  Meeting  of  the  American  Association  for  Cancer  Research. 

8.  An  invited  oral  presentation  “The  Use  of  Tissue  Prints  of  Prostate  Cancer  Biopsies  for  the  Analysis  of  Non- 
Resected  Prostate  Cancer”  was  presented  by  Dr.  Gaston  at  the  Illumina  Key  Opinion  Leader  Biobank  Summit  in 
Boston,  MA,  May  2015. 

9.  An  invited  oral  presentation  on  “Tissue  resources  and  the  Association  of  Racial  Admixtures  and  the  Risk  for  High 
Grade  Prostate  Cancer”  was  presented  by  Dr.  Grizzle  at  the  Illumina  Key  Opinion  Leader  Biobank  Summit  in 
Boston,  MA,  May  2015. 

10.  Abstract  accepted  entitled  “Improving  the  accuracy  and  diagnostic  power  of  prostate  biopsy  for  African  American 
patients:  the  Birmingham  Alabama  Prostate  Cancer  (BAPrCa)  Consortium”  to  be  presented  by  Dr.  Gaston  at  the 
Eighth  AACR  Conference  on  the  Science  of  Cancer  Health  Disparities  in  Racial/Ethnic  Minorities  and  the 
Medically  Underserved  in  Atlanta,  GA,  November  2015  (copy  attached  to  this  report) 

1 1 .  Abstract  presentation  entitled  “Combined  DNA-Methylation  Intensity  and  Clinical  Risk  Score  Stratifies  Patients 
for  High-Grade  Disease”  at  the  EAU  Section  of  Urological  Research  (ESUR)  meeting  in  Nijmegen,  The 
Netherlands,  September  2015  (copy  attached  to  this  report). 

12.  Invited  oral  presentation  entitled  “Tissue  print  technologies:  An  innovative  and  practical  approach  to  obtaining 
high  quality  research  samples  from  biopsies  and  other  challenging  biospecimens”  will  be  presented  by  Dr.  Gaston 
at  IIR's  Biorepositories  and  Sample  Management  Summit  in  Boston,  MA,  October  2015  (copy  of  the  abstract  for 
this  presentation  attached  to  this  report) 

13.  Invited  submission  of  FY15-FY16  PCRP  program  materials  describing  this  DOD  sponsored  project;  these 
materials  include  text  and  images  for  the  program  booklet,  CDMRP  website  features,  and  the  PCRP  newsletter 
(PCRP  Perspectives).  September  2015  (copy  of  the  invitation  is  attached  to  this  report). 

14.  Invitation  to  Dr  Grizzle  and  Dr  Gaston  to  apply  for  the  DOD  PCRP  Health  Disparity  Research  Award, 

Application  submitted  September  24  2015. 

15.  With  our  collaborator  Dr.  Soroush  Rais-Bahrami  as  PI,  Drs.  Gaston  and  Grizzle  are  co-investigators  on  a  newly 
awarded  pilot  grant  from  the  UAB  Cancer  center  to  explore  potential  correlations  between  MRI  parameters  that 
define  regions  of  the  prostate  as  “suspicious”  for  prostate  cancer  and  molecular  prostate  cancer  field  effects. 
September  2015 

16.  Book  Chapter  in  Press 

Burke  HB,  Grizzle  WE.  Clinical  Validation  of  Molecular  Biomarkers  in  Translational 
Medicine  in  Biomarkers  in  Cancer  Screening  and  Early  Detection ,  Sudhir  Srivastava,  editor, 

Wiley,  Oxford,  UK. 
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CHALLENGES  AND  PROBLEMS 


Our  main  challenge  affects  all  research  in  prostate  cancer.  Specifically,  it  is  difficult  to  define  “aggressive”  PrCas 
except  by  association  with  Gleason  score  because  of  the  many  indolent  PrCas  and  the  long  time  it  takes  to  define  a 
recurrence  of  these  tumors.  Thus,  we  rely  on  Gleason  score  to  correlate  with  aggressiveness  except  for  tumors  that  are 
known  to  reoccur.  Similarly,  when  working  with  bodily  fluids,  the  “controls”  (i.e.,  cases  without  PrCa)  are  difficult  to 
define  because  PrCas  tend  to  be  asymptomatic  and  cases  biopsied  may  be  false  negatives  in  that  the  lesions  may  be 
missed  on  biopsy.  This  is  one  reason  that  we  have  begun  to  select  cases  biopsied  by  MRI-US.  Also,  we  are  still  trying 
to  define  changes  in  bodily  fluids  that  may  occur  in  storage  at  -80°c  or  colder.  We  are  addressing  this  by  trying  to 
analyze  samples  of  bodily  fluids  that  are  <  lyear  of  age.  We  also  will  match  samples  by  month  of  age.  Our  final 
challenge  is  trying  to  decide  when  to  publish  our  positive  results.  We  have  now  collected  tissue  prints  from  over  100 
AAs  and  80  EAs.  We  await  the  ancestry  informative  markers  from  more  recent  cases  with  a  goal  of  publishing  our 
initial  manuscripts  in  November  2015.  Other  than  associated  with  the  above  challenges,  we  have  no  major  problems 
of  which  we  are  aware. 

REFERENCES 

Potter  DM,  Butterfield  LH,  Divito  SJ,  Sander  CA,  Kirkwood  JM.  Pitfalls  in  retrospective  analyses  of  biomarkers:  A  case 
study  with  metastatic  melanoma  patients.  J  Immunol  Methods  2012;376  (1-2):  108-1 12. 

Gorlov  IP,  Yang  JY,  Byun  J,  Logothetis  C,  Gorlova  OY,  Do  KA,  Amos  C.  How  to  get  the  most  from  microarray  data: 
advice  from  reverse  genomics.  BMC  Genomics.  2014;15:223. 
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SUPPORTING  DATA 


Study  Subjects  Enrolled  and  Biopsy 

Tissue  Prints  Collected* 

Tissue  Prints  Collected  from  UCA  and 

UAB  as  of  September  30,  2015 

All  races 

AA 

EA 

All  Subjects 

187 

103 

84 

Benign 

90 

52 

38 

High  Grade 

58 

29 

29 

Low  Grade 

39 

22 

17 

*  Note  that  Enrollment  Tables  Exclude  2  Study  Subjects 
1  Withdrawal  from  UCA  and  1  Omitted  from  UAB  (both  AA) 

EN  Table  1 


Enrollment  at  UCA  and  UAB  Study  Sites* 

Tissue  Prints  Collected  from  UCA  as  of 

September  30,  2015 

All  races 

AA 

EA 

All  Subjects 

59 

59 

0 

Benign 

33 

33 

0 

High  Grade 

12 

12 

0 

Low  Grade 

14 

14 

0 

Tissue  Prints  Collected  from  UAB  as  of 
September  30,  2015 

All  races 

AA 

EA 

All  Subjects 

128 

44 

84 

Benign 

57 

19 

38 

High  Grade 

46 

17 

29 

Low  Grade 

25 

8 

17 

*  Note  that  Enrollment  Tables  Exclude  2  Study  Subjects 
1  Withdrawal  from  UCA  and  1  Omitted  from  UAB  (both  AA) 


EN  Table 
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Summary  of  All  Study  Subjects 

African 

American 

European 

American 

Diagnosis  Group 

N 

%  of  103 

N 

%  of  84 

All  Subjects 

103 

100% 

84 

100% 

No  Cancer  on  Biopsy  (Benign  Diagnosis) 

52 

50% 

38 

45% 

Ca  Positive,  Low  Grade  (Only  Gl  sum  6) 

22 

21% 

17 

20% 

Ca  Positive,  Gl  sum  3+4 

17 

17% 

12 

14% 

Ca  Positive,  Gl  sum  4+3  or  more 

12 

12% 

17 

20% 

Summary  of  Cancer  Positive 

African 

European 

Study  Subjects 

American 

American 

Diagnosis  Group 

N 

%  of  51 

N 

%  of  46 

Cancer  Positive  Subjects 

51 

100% 

46 

100% 

Ca  Positive,  Low  Grade  (Only  Gl  sum  6) 

22 

43% 

17 

37% 

Ca  Positive,  Gl  sum  3+4 

17 

33% 

12 

26% 

Ca  Positive,  Gl  sum  4+3  or  more 

12 

24% 

17 

37% 

EN  Table  3 


GI4+3 


or  more 


African  American 
N  =  103 


Gl  4+3 


or  more 


European  American 
N  =  84 


EN  Figure  1:  Biopsy  Diagnosis  in  AA  vs  EA  Study  Subjects 
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MRI  Targeted  Biopsy  Study  Subjects 

Enrolled  as  of  September  30,  2015 

Diagnosis 

Group 

All  races 

AA 

%  of  11 

EA 

%  of  47 

All  Subjects 

58 

11 

100% 

47 

100% 

Benign 

28 

8 

73% 

20 

43% 

High  Grade 

19 

1 

9% 

18 

38% 

Low  Grade 

11 

2 

18% 

9 

19% 

EN  Table  4 


MRI  Targeted  Biopsy  Subjects  as  of  September  30,  2015 
Comparison  of  Standard  and  Targeted  Cores 

Diagnosis 

Group 

Subjects 

All  Std 

Cores 

All  Tgt 
Cores 

Low  Gl 

Cores 

High  Gl 
Cores 

All  Subjects 

58 

420 

324 

50 

64 

Benign 

28 

181 

155 

0 

0 

High  Grade 

19 

155 

94 

21 

64 

Low  Grade 

11 

84 

75 

29 

0 

Conventional  (non-MRI)  Biopsy  Subjects  as  of 
September  30,  2015 

Diagnosis 

Group 

Subjects 

Standard 

Cores 

Low  Gl 

Cores 

High  Gl 

Cores 

All  Subjects 

129 

1675 

114 

173 

Benign 

62 

871 

0 

0 

High  Grade 

39 

468 

47 

173 

Low  Grade 

28 

336 

67 

0 

EN  Table  5  and  6 


16 


8 

to 

2 

LO 

Q. 

CO 

£  1.5 


FABP5 
Cyotplasmic  Scores 

p  <  0.0001 

p  <  0.0001 


O  European  American 
▲  African  American 


FABP5 


•  European  American 


T 

A 


Normal 
n  =  13 


— flU* 

Msl 


AAAAA 


Uninvolved 
n  =54 


•U 

& 


2 

4 


Cancer 
n  =  53 


8 
l /> 

2 

LO 

Q_ 

CO 

£  1.5 


Normal 
n  =  13 


Legend 


FABP5 

Nuclear  Scores 

p  <  0.0001 


#  European  American 
▲  African  American 


p  <  0.0001 


## 

4? 


Uninvolved 
n  =54 


m!*a 

? 

4* 

f 

OOA 

AX 


Cancer 
n  =  53 


o 

u 

to 

2 

m 

CL 

CQ 

£  1.5 


? 


FABP5 

Perinuclear  Scores 

p  <  0.0001 


•  European  American 
▲  African  American 


Normal 
n  =  13 


p  <  0.0001 


* 


kAA 


Uninvolved 
n  =54 


•▲▲▲ 

* 


•A 


Cancer 
n  =  53 


Figure  IHC-1  Expression  of  FABP5  in  the  prostate.  This  figure  demonstrates  FABP5  staining 
that  is  broken  down  as  to  intracellular  patterns  of  staining-  cytoplasmic,  membrane,  nuclear 
and  perinuclear.  Normal  is  based  on  prostates  removed  during  radical  cystectomy.  These 
prostates  did  not  have  PrCa  on  pathologic  examination.  Uninvolved  represents  the  matching 
normal  appearing  prostate  glands  from  case  of  prostate  cancer.  While  "cancer"  indicates  the 
matching  PrCa  to  the  uninvolved  glands.  Note,  there  is  statistically  significant  differences 
between  PrCa  (cancer)  and  normal  (p<  0.0001)  and  PrCa  and  uninvolved  glands  (p<  0.0001). 
Of  interest,  in  the  high  levels  of  expression  of  FABP5,  there  are  somewhat  more  African 
Americans  (Table  IHC-1)  in  all  categories  of  expression,  but  especially  in  nuclear  expression. 
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Legend 

Figure  IHC-2  Expression  of  FABP5  in  PrCa.  This  figure  demonstrates  the  intracellular 
distribution  of  staining  of  PrCa-cytoplasmic,  membrane,  nuclear  and  perinuclear.  The  PrCa  is 
broken  down  according  to  the  Gleason  Score  (GS)  of  the  case,  i.e.,  GS  6,  GS  3+4,  GS  4+3,  and 
GS  8-10.  These  are  the  cases  randomly  selected  to  date.  The  number  of  cases  with  GS  6, 

4+3,  and  8-10  cases  will  be  expanded  in  the  next  quarter. 
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Legend: 

Figure  IHC-3  Expression  of  FABP5  in  PrCa.  This  figure  demonstrates  the  intracellular 
distributions  of  FABP5  in  an  area  of  prostate  cancer  (red  block  arrows)  adjacent  to 
uninvolved  glands  with  minimal  staining  (green  block  arrows).  Magnification  x400  focally 
increases  to  x630.  Thin  black  arrows  point  to  nuclear  staining  with  FABP5  and  red  arrows 
(thin  and  block)  point  to  staining  of  the  cell  membrane. 

There  is  variable  staining  of  cells  with  FABP5  even  within  the  same  malignant  gland  and 
there  is  little  to  no  staining  in  uninvolved  glands.  FABP5  staining  is  increased  at  the 
membranes  of  cells  and  in  the  nuclei  of  some  cells.  This  is  emphasized  in  figure  IHC-1. 
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Figure  IHC-4:  Expressions  of  FABP1  in  the  prostate.  This  figure  is  similar  to  IHC-1  except 
that  FABP1  is  being  evaluated.  Of  note,  there  is  much  less  differential  expression  of  FABP1 

between  the  normal  glands  and  PrCa  and  the  normal  appearing  uninvolved  glands  and  PrCa 
than  there  is  for  either  FABP5  (Figure  IHC-5)  or  FASN  (Figure  IHC-7).  Although  this  degree  of 
differential  expression  is  small,  for  cytoplasmic  expression  and  membrane  expression  when 
uninvolved  glands  are  compared  with  PrCa  there  is  a  statistical  difference 
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Figure  IHC-5  Expression  of  FABPl  in  PrCa.  This  figure  demonstrates  the  range  of 
expression  of  FABPl  in  tumors  broken  down  by  Gleason  scores  (GS).  These  are  the  cases 
randomly  chosen.  In  the  next  quarter  we  will  target  GS  case  of  6,  4+3,  and  8-10  that  are 
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Legend: 

Figure  IHC-6  Expression  of  FABP1  in  PrCa.  Panel  A  (x200)  demonstrates  the  intracellular 
expression  of  FABP-1  in  an  area  of  PrCa  (red  block  arrows)  adjacent  to  uninvolved  prostate 
glands  (green  block  arrows).  At  x630  magnification,  in  panels  B,  C  and  D,  the  luminal  cells 
of  uninvolved  prostate  glands  (green  block  arrows)  and  PrCa  (red  block  arrows)  have 
membrane  expression  (blue  arrows)  while  the  thin  black  arrows  point  to  nuclear  staining 
of  FABP-1  in  PrCa.  In  this  case  the  staining  of  the  nuclei  for  FABP-1  is  less  than  cytoplasmic 
staining. 

There  is  consistent  staining  of  the  cytoplasm  with  FABP1  with  increased  staining  of  the 
cell  membranes  in  both  uninvolved  luminal  cells  and  PrCa  cells.  There  is  little  differences 
in  the  expression  of  FABP1  between  PrCa  and  uninvolved  luminal  cells;  however,  because 
of  the  large  number  of  cases,  the  differences  are  statistically  significant.  Overall  this 
pattern  is  consistent  with  the  results  summarized  in  Figure  IHC-4. 
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Legend 


Figure  IHC-7  Expression  of  FASN  in  the  prostate.  This  figure  demonstrates  the  differential  expression  of 
FASN  in  glands  of  "normal"  prostate  compared  to  "cancer"  (PrCa)  (p<0.0001).  The  normal  prostate  glands 
are  from  prostatectomies  removed  as  part  of  radical  cystectomies  that  were  found  on  pathologic 
examination  to  not  have  PrCa.  Similarly,  uninvolved  (normal  appearing)  prostate  glands  were  compared 
with  the  matching  PrCas  as  to  the  differential  expression  of  FASN  (both  p<0.0001). 

Of  note,  most  of  the  higher  values  of  FASN  (cytoplasmic  and  membrane)  were  in  the  EA  population  in 
contrast  to  FABP5;  however,  this  was  not  the  case  in  the  uninvolved  prostate  in  which  AAs  predominated.  A 
similar  pattern  was  seen  in  perinuclear  staining  (p<0.0001).  In  nuclear  staining  the  pattern  also  was  similar 
for  uninvolved  (p<0.0001)  but  not  in  the  normal  prostate.  Normal  versus  PrCa  showed  differential  staining 
(p=0.0034). 
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Legend 

Figure  IHC-8  Expression  of  FASN  in  PrCa.  This  figure  demonstrates  the  expression  of  FASN 
in  PrCa  separated  by  Gleason  scores  (GS).  These  cases  were  randomly  selected  and  results 
are  expressed  at  cytoplasmic,  membrane,  nuclear  and  perinuclear  intracellular  patterns. 
Of  note,  we  need  more  cases  of  Gleason  scores  6,  4+3,  and  8-10.  Of  importance,  the 
higher  values  of  FASN  tend  to  occur  in  EA  patients.  In  the  next  quarter  we  will  add  the 
needed  cases  and  adjust  the  racial  mix. 


25 


Legend 

Figure  IHC-9  Expression  of  FASN  in  the  prostate.  Panel  A  original  magnification  x200 
demonstrates  an  area  where  high  grade  PrCa  (red  block  arrows)  surrounds  uninvolved 
prostate  ducts  (green  block  arrows).  Panel  B  original  magnification  x200  is  an  area  in  which 
foci  of  PrCa  (red  block  arrows)  are  surrounded  by  lymphocytes  which  are  not  stained  by 
FASN.  In  this  high  grade  PrCa,  the  thin  black  arrows  point  to  nuclei  in  which  FASN  is 
expressed.  Panel  C  (x630)  demonstrates  high  grade  prostate  cancer  (red  block  arrows)  with 
most  nuclei  that  have  no  FASN  staining  as  does  Panel  D  (x630).  However,  some  nuclei  of 
both  Panels  C  and  D  also  contain  nuclei  which  are  stained  with  FASN. 
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Figure  IHC-10  Correlation  of  FABP5  expression  with  the  expression  of  FASN  in  PrCa.  This  figure  was  prepared 
to  test  if  the  same  changes  observed  at  the  mRNA  level  in  which,  for  some  case  of  PrCa,  the  mRNA  of  FABP5 
and  FASN  were  oppositely  expressed  (i.  e.,  Is  FASN,  4,  FABP5,  or  \|/FASN,  FABP5)  were  present  at 
at  the  protein  level.  This  pattern  was  not  observed  for  any  of  the  intracellular  components.  What  is 
observed  is  that  higher  FABP5  scores  tend  to  be  enriched  in  AAs  and  higher  FASN  scores  are  enriched  in 
EAs  (Table  IHC-1).  This  is  especially  apparent  for  nuclear  and  perinuclear  expression. 
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Legend 

Figure  IHC-11  Correlation  of  FASN  with  FABP1  in  PrCa.  As  with  Figure  IHC-10,  no  apparent 
pattern  of  opposite  expression  between  FASN  and  FABP1  was  noted.  A  pattern  of  increased 
expression  of  FASN  in  EAs  is  apparent  at  the  cytoplasmic,  membrane,  nuclear  and 
perinuclear  areas  of  the  cell. 
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Figure  IHC-12  Correlation  of  FABP5  with  FABP1  in  PrCa.  Again,  this  does  not  appear  to  be  an 
inverse  correlation  between  the  expressions  of  FABP5  with  FABP1.  Of  note  compared  with 
FABP1,  there  is  an  relative  increase  in  FABP5  expression  in  AAs  at  the  cytoplasmic, 
membrane,  nuclear,  and  perinuclear  areas  of  malignant  cells.  This  is  especially  apparent  in 


the  nuclear  and  perinuclear  areas  of  the  cells. 
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Table  IHC-1  Comparison  of  FABP5  versus  FASN  Phenotypic  Expression 


Marker  and  Intracellular 
Component 

AAs 

EAs 

FABP5 

Cutoff 

%  >  Cutoff 

%  >  Cutoff 

Cytoplasmic 

2.5 

26 

23 

Membrane 

2.5 

43 

42 

Nuclear 

2.0 

35 

19 

Perinuclear 

2.5 

35 

26 

FASN 

Cytoplasmic 

2.0 

17 

35 

Membrane 

2.5 

4 

23 

Nuclear 

1.0 

34 

29 

Perinuclear 

2.0 

26 

42 
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FIGURE  MS-1 


Legend 
Figure  MS-1 

After  filtering  the  data  -900  proteins  were  identified  with 
<0.1%  FDR.  Of  those,  514  proteins  were  found  to  be 
identified  in  >30%  of  patient  specimens  for  each  arm.  We 
have  found  that  at  least  30%  of  samples  per  statistical  arm 
must  have  quantifiable  peptides  in  order  to  obtain  robust 
analysis  (we  call  this  a  commonality  filter).  However,  for 
specific  proteins  of  interest  we  do  go  back  and  pull  out  the 
data  to  identify  potential  proteins  of  interest. 
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FIGURE  MS-2 

Systems  Analysis 
(GO  Associated  Localizations) 


Cellular  Component  GO  Term  Annotation  Comparison 


Annotations  per  GO  Term 


Legend 
Figure  MS-2 

Systems  analysis  demonstrating  the  cellular  components  from 
which  the  modulated  proteins  of  prostate  cancer  are  associated. 
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FIGURE  MS-3 

Systems  Analysis 
(GO  Associated  Processes) 


Legend  Figure  MS-3 

Systems  analysis  demonstrating  the  biological  process 
associated  with  the  modulated  proteins  of  PrCa. 
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FIGURE  MS-4 


Protein  Abundance  Changed  in  Prostate  Biopsies 
(Tumor  vs.  Matched  Uninvolved) 


Protein  ID's 
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Legend 
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FIGURE  MS-5 


Legend 
Figure  MS-5 

After  filtering  the  data  -900  proteins  were  identified  with 
<0.1%  FDR.  Of  those,  298  proteins  were  found  to  be 
identified  in  >50%  of  patient  specimens  for  each  arm.  We 
have  found  that  at  least  50%  of  samples  per  statistical  arm 
must  have  quantifiable  peptides  when  one  arm  is  limited  as  is 
the  case  here.  The  AA  arm  was  limited  to  8  patients  and 
therefore  we  had  to  go  with  a  limit  of  4  patients  with 

quantifiable  data  in  order  to  obtain  robust  analysis  (we  call 
this  a  commonality  filter). 


TABLE  MS-1 

10  Proteins  Increased  in  Tumor  Tissues  of 

AAs  vs  EAs 


Protein  Name 

Comments 

Accession  # 

Network 

Name 

SAM 

Ttest 

Fold 

(T-AA/EA) 

Keratin,  type  II 
cytoskeletal  5 

Attaches  to  K14  to  form 

intermediate  filaments 
and  hence  cell 
connections  anchored  to 
desmosomes;  primarily 
focused  in  stratified 
epithelium 

P13647 

Keratin  5 

0.48 

0.029 

1.5 

Alpha-actinin-4 

Cancer  cell  motility  Met? 

043707 

Alpha- 

actinin 

0.62 

0.024 

1.6 

Ubiquitin-like 
modifier-activating 
enzyme  1 

Involved  in  conjugation 
to  ubiquitin;  targeted  in 

cancer 

P22314 

UBA1 

0.52 

0.030 

1.7 

Alpha-1- 

antichymotrypsin 

Less  in  prostate  cancer; 
high  in  pancreatic 
cancer;  lower  in  cancer 
than  uninvolved;  higher 
in  more  advanced 

cancers 

P01011 

SERPINA3 

(act) 

0.47 

0.040 

1.7 

40S  ribosomal 
protein  S17 

POCW22 

RPS17 

0.58 

0.046 

1.7 

Zinc-alpha-2- 

glycoprotein 

Stimulates  lipolysis; 
regulated  by 
glucocorticoids;  tumor 
biomarker;  cachexia  of 

cancer 

P25311 

AZGP1 

0.54 

0.023 

1.8 

Galectin-3-binding 

protein 

Cancer  associated 
protein  affects  motility; 
metastasis  (?) 

Q08380 

90K 

0.60 

0.022 

1.8 

Malate 

dehydrogenase  2 

Q6FHZ0 

MDH2 

0.58 

0.035 

1.9 

Prostate-specific 

antigen 

P07288 

Kallikrein  3 
(PSA) 

0.61 

0.008 

2.2 

Complement 
component  4A 

B2RUT6 

C4 

0.63 

0.058 

2.7 
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Of  the  10  proteins  increased  in  prostate  cancer  from  AA  patients  compared  to  EA 
patients,  3  of  the  proteins  may  be  involved  in  the  aggressiveness  of  prostate 
cancer  in  AAs.  We  found  this  encouraging  and  plan  to  increase  the  power  of  the 
study  by  macrodissecting  and  analyzing  additional  cases  by  mass  spectrometry. 


TABLE  MS-2 

21  Proteins  Decreased  in  Tumor 


Protein  Name 

Accession  # 

Network  Name 

SAM  Ttest  FoldfT-AA/C) 

Glutaredoxin-1 

P35754 

Glutaredoxin 

0.79  0.013 

-2.3 

Tubulin  beta-6  chain 

Q9BUF5 

Tubulin  beta 

0.87  0.003 

-2,2 

Hematological  and  neurological  expressed  1 

Q9H910 

HN1L 

0.60  0.016 

-2.0 

Cadherin-1 

PI  2830 

CDH1 

0.57  0.032 

-1.9 

Vitronectin 

P04004 

Vitronectin 

0.70  0.006 

-1.9 

Beta-2-microglobulin 

P61769 

Beta-2-microglobulin 

0.58  0.040 

-1.8 

40S  ribosomal  protein  SI  9 

P39019 

RPS19 

0.60  0.010 

-1.8 

Protein  canopy  homolog  2 

Q9Y2B0 

MSAP 

0.58  0.040 

-1.8 

Hemoglobin  subunit  delta 

P02042 

Adult  hemoglobin 

0.63  0.008 

-1.8 

N-sulphoglucosaminesulphohydrolase 

P51688 

SPHM 

0.70  0.007 

-1.8 

Brain  acid  soluble  protein  1 

P80723 

BASP1 

0.49  0.039 

-1.8 

Ubiquitin-conjugating  enzyme  E2 

Q15819 

MMS2 

0.67  0.020 

-1.7 

Cathepsin  Z 

Q9UBR2 

Cathepsin  Z 

0.69  0.016 

-1.7 

Laminin  subunit  alpha-5 

015230 

LAMA5 

0.55  0.020 

-1.7 

Fibrillin-1 

P35555 

Fibrillin 

0.43  0.040 

-1.7 

Laminin  subunit  gamma-1 

P1 1047 

LAMG1 

0.59  0.011 

-1.7 

Myosin  light  chain  kinase 

Q15746 

MLCK 

0.59  0.012 

-1.6 

Beta-globin 

C8C504 

HBB 

0.56  0.014 

-1.6 

Protein  disulfide-isomerase 

P07237 

P4HB 

0.51  0.048 

-1.6 

Apolipoprotein  A-l 

P02647 

AP0A1 

0.44  0.050 

-1.6 

ETHE1 

095571 

HSC0 

0.49  0.042 

-1.5 

FIGURE  MS-6 
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FFPE  Prostate  Tissue  Biomarkers  Compared  Between  AAs  vstAs 

(PrCa  -  p<0.05  for  AAs  vs  EAs  only) 


Legend 
Figure  MS-6 

All  values  are  indicated  for  those  proteins  that  are  significantly  changed  in  PrCa 
tissues  for  AAs  vs  EAs. 


FIGURE  MS-7 
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FFPE  Prostate  Tissue  Biomarkers  Compared  Between  AA  vs.  C 

(PCa  -  p<0.05  for  AAvC  only) 


Legend 
Figure  MS-7 

All  values  are  indicated  for  those  proteins  that  are  significantly  changed  in 
PrCa  tissues  for  AAs  vs  EAs. 
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FIGURE  MS-8 


Fibrillin-1  Levels  in  FFPE  Prostate  Tissues  Compared  BetweenAAs  vs  eas 

(PrCa  -  p<0.05  for  AAs  vs  EAs) 
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Legend 
Figure  MS-8 

This  is  just  one  example  that  was  worth  highlighting  where 
all  values  are  indicated  for  those  proteins  that  are 
significantly  changed  in  PrCa  tissues  for  AAs  vs  EAs. 


FIGURE  MS-9 
STRAP  GO  Analysis  for 
Significantly  Changed  Proteins 
(PrCa  -  AAs  vs  EAs) 


Cellular  Component  GO  Term  Annotation  Comparison 


Annotations  per  GO  Term 


Legend 
Figure  MS-9 

Systems  analysis  demonstrating  the  cellular  components  of 
modulated  proteins  of  PrCa  between  AAs  vs  EAs. 


FIGURE  MS-10 
STRAP  GO  Analysis  for 
Significantly  Changed  Proteins 
(PrCa  -  AAs  vs  EAs) 


Molecular  Function  GO  Term  Annotation  Comparison 


Annotations  per  GO  Term 


Legend 
Figure  MS-10 

Systems  analysis  demonstrating  the  molecular  function  of 
modulated  proteins  of  PrCa  between  AAs  vs  EAs. 


FIGURE  MS-11 
STRAP  GO  Analysis  for 
Significantly  Changed  Proteins 
(PrCa  -  AAs  vs  EAs) 


Biological  Process  GO  Term  Annotation  Comparison 


Annotations  per  GO  Term 
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Legend 
Figure  MS-11 

Systems  analysis  demonstrating  the  biological  process  of 
modulated  proteins  of  PrCa  between  AAs  vs  EAs. 


FIGURE  MS-12 
Network  1.  SERPINA3  (ACT), 
Oncostatin  M,  BMP7,  IL-6, 
Thrombopoietin 
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Legend  Figure 
MS-12 


>  Immune  response 

>  Defense  response 

>  Response  to  stimulus 


This  is  an  example  of  a  system  analysis  for  SERPINA3. 
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Biopsy  Diagnosis: 
No  PrCa 
N  =  43 
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West  African  Ancestry  0.74  to  0.50 


High  grade  PrCa  (Gl  sum  7  or  more) 


Low  grade  PrCa  (Gl  sum  6) 


FASN 


GE  Figure  1:  Outlier  gene  expression  patterns  of  FABP5  and  FASN  in  PrCa  biopsies  from  AA 
and  EA  subjects 

In  our  studies  of  prostate  biopsies  from  our  prostate  biopsy  study  subjects,  FABP5  and  FASN  mRNA 
expression  showed  patterns  consistent  with  "outlier"  PrCa  subtypes.  Further  analysis  of  PrCa  gene 
expression  data  available  through  Oncomine  showed  similar  outlier  patterns  for  FABP5  and  FASN  in 
at  least  four  other  independent  studies.  In  our  study  subjects,  the  subgroup  of  cancers  showing 
FABP5  super  over-expression  were  predominant  of  AA  origin,  while  the  cancers  showing  FASN  super 
over-expression  were  predominantly  from  EAs. 


□  African  American 
I  |  European  American 


Top  quartile 


GE  Figure  2:  Super-overexpression  (10  fold  or  greater)  may  define  two  PrCa  subgroups  with 
different  molecular  phenotypes  for  fatty  acid  processing/synthesis.  Top  quartile  mRNA 
expression  patterns  for  FABP5  and  FASN  in  biopsies  from  AA  and  EA  subjects  are  consistent 
with  two  PrCa  subtypes  that  show  an  "either-or"  super-overexpression  at  the  mRNA  level.  In 
our  study  subjects,  AA  prostate  cancers  predominate  in  the  subgroup  showing  the  highest 
levels  of  FABP5  overexpression  and  EA  prostate  cancers  in  the  subgroups  showing  the  highest 
levels  of  FASN  overexpression. 
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Disparities  in  Racial/Ethnic  Minorities  and  the  Medically  Underserved 


Title:  Improving  the  accuracy  and  diagnostic  power  of  prostate  biopsy  for  African 
American  patients:  the  Birmingham  Alabama  Prostate  Cancer  (BAPrCa)  Consortium 

Sandra  M.  Gaston1:  Soroush  Rais-Bahrami2;  Rick  Kittles3;  Kerry  Dehimer1;  Dennis  Otali2; 
Jeffrey  W.  Nix2;  Peter  N.  Kolettis2;  George  Adams4;  William  E.  Grizzle2. 

^ufts  Medical  Center,  Boston,  MA,  University  of  Alabama  at  Birmingham,  Birmingham, 
Alabama,  3University  of  Arizona,  Tucson,  Arizona,  4Urology  Centers  of  Alabama,  Homewood, 
Alabama. 

Study  Purpose:  Both  incidence  and  mortality  data  show  that  the  burden  of  prostate  cancer 
(PrCa)  is  greater  in  African  Americans  (AA)  than  in  European  Americans  (EA). 

Socioeconomic  factors  contribute  to  this  health  disparity,  but  do  not  fully  account  for 
observations  that  AA  are  more  likely  than  others  to  be  diagnosed  with  more  aggressive  and 
life  threatening  forms  of  PrCa.  Prostate  biopsies  usually  establish  the  diagnosis  of  PrCa  and 
are  used  to  estimate  the  extent  of  the  disease  (based  on  the  number  and  location  of  cores 
with  cancer  and  involvement  of  individual  cores)  and  its  potential  aggressiveness  (based  on 
Gleason  scores).  Health  policy  groups  recommend  that  men  with  limited  low  grade  prostate 
cancer  be  managed  by  active  surveillance  (AS)  rather  than  immediate  surgical  or  radiation 
treatment.  However,  the  standard-of-care  prostate  biopsy  is  limited  by  sampling  error  and 
the  possibility  that  a  high  grade  PrCa  might  have  been  missed  is  a  significant  concern  for 
many  patients  who  are  considering  AS;  this  concern  is  heightened  for  AA  because  of  their 
higher  risk  of  aggressive  disease.  Moreover,  AA  are  more  likely  to  be  diagnosed  with  high 
grade/high  stage  prostate  cancer  that  is  not  treated  surgically  and  thus  not  well  represented 
in  molecular  studies  that  utilize  radical  prostatectomy  specimens.  Our  research  team 
established  the  Birmingham  Alabama  Prostate  Cancer  (BAPrCa)  Consortium  with  a  major 
focus  on  the  molecular  analysis  of  prostate  biopsies  in  order  to  increase  the  clinically 
actionable  information  that  can  be  obtained  from  these  specimens.  We  use  an  ancestry- 
informed  approach  that  is  specifically  designed  to  improve  the  accuracy  and  diagnostic 
power  of  prostate  biopsy  for  AA  patients. 

Experimental  Procedures:  The  BAPrCa  Consortium  implemented  an  innovative  prostate 
biopsy  "tissue  print"  technology  that  permits  collection  of  snap-frozen  nitrocellulose  blots  of 
biopsy  cores  without  diagnostically  compromising  these  specimens.  Tissue  prints  provide 
high  quality  RNA  and  DNA  from  biopsies  from  the  full  range  of  patients,  including  AAs  whose 
cancer  is  too  advanced  at  diagnosis  for  radical  prostatectomy;  this  permits  the  molecular 
characterization  of  PrCa  subtypes  in  men  diagnosed  with  high  volume/high  grade  disease 
who  have  not  been  adequately  represented  in  previous  molecular  profiling  studies.  Our 
BAPrCa  research  protocols  include  informed  consent  for  genetic  ancestry  admixture  studies. 
Gene  expression  analysis  of  prostate  biopsy  tissue  prints  is  correlated  with  histopathology 
and  multi  para  metric  prostate  MRI. 

Results:  Our  data  suggest  that  in  the  Birmingham  area,  higher  prostate  cancer  risk  in  AA  is 
associated  with  increasing  proportion  of  West  African  (WA)  ancestry,  which  may  reflect  the 
prevalence  of  population-specific  genetic  mutations  or  variations  that  contribute  to  the 
development  of  more  aggressive  disease.  As  a  group,  the  men  diagnosed  with  high  grade 
PrCa  showed  a  significantly  higher  level  of  WA  ancestry  than  the  men  diagnosed  with  no 
cancer  (P  =  0.001).  A  similar  pattern  is  observed  in  comparisons  of  AA  men  diagnosed  with 
high  grade  cancer  vs  low  grade  PrCa.  Inasmuch  as  our  AIMs  genotyping  panel  uses  a  small 


number  of  well-established  AIMS  markers,  our  observation  of  significantly  increased  risk  of 
PrCa  in  AA  men  with  high  %WA  AIMS  ancestry  may,  if  confirmed,  have  immediate  potential 
clinical  applications  for  improving  prostate  cancer  screening  and  active  surveillance. 
Moreover,  gene  expression  profiles  of  biopsies  from  BAPrCa  patients  diagnosed  with  high 
volume/high  grade  PrCa  revealed  two  subtypes  of  high  grade  PrCa  with  striking  differences 
in  the  pathways  that  drive  a  shift  in  tumor  fatty  acid  metabolism;  one  is  a  fatty  acid 
synthase  (FASN)  dominant  phenotype  and  the  other  a  previously  unrecognized  fatty  acid 
binding  protein  (FABP5)  dominant  phenotype.  Our  data  suggest  that  the  FABP5  dominant 
PrCa  subtype  is  more  common  in  AA  and  the  FASN  dominant  subtype  more  common  in  EA. 
These  findings  may  provide  the  basis  for  more  effective  dietary  interventions  and  targeted 
therapies  for  AA  and  EA  patients  with  high  grade  PrCa. 

Conclusions:  By  utilizing  innovative  tissue  print  techniques  for  the  molecular  analyses  of 
prostate  biopsies  and  using  an  ancestry  informed  approach  in  our  study  designs,  the 
Birmingham  Alabama  Prostate  Cancer  (BAPrCa)  Consortium  has  identified  new  and 
potentially  actionable  PrCa  signatures  that  may  improve  the  accuracy  and  diagnostic  power 
of  prostate  biopsy  for  AA  patients. 
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Combined  DNA-methylation  intensity  and  clinical  risk  score  stratifies  patients  for  high-grade 
disease. _ _ _ 

Leander  Van  Neste,  Grant  Stewart,  Sandra  Marlene  Gaston,  William  E.  Grizzle,  George  W.  Adams,  Gary  P 
Kearney,  Jonathan  I  Epstein,  David  James  Harrison,  Alan  W.  Partin,  Wim  Van  Criekinge;  Maastricht  University 
Medical  Center,  Maastricht,  Netherlands;  University  of  Edinburgh,  Edinburgh,  United  Kingdom;  Tufts  Medical 
Center,  Boston,  MA;  Department  of  Pathology,  University  of  Alabama  at  Birmingham,  Birmingham,  AL; 

Urology  Centers  of  Alabama,  Urology,  Homewood,  AL;  New  England  Baptist  Hospital,  Boston,  MA;  Johns 
Hopkins  University  School  of  Medicine,  Baltimore,  MD;  University  of  St.  Andrews  School  of  Medicine,  St. 
Andrews,  United  Kingdom;  The  Johns  Hopkins  Hospital,  Baltimore,  MD;  University  of  Ghent,  Ghent,  Belgium 


Abstract  Text: 

Background:  Prostate  cancer  (PCa)  diagnostics  remains  challenging  due  to  fear  of  over-diagnosis  and 
overtreatment.  Due  to  low  accuracy  of  PSA  too  many  men  are  biopsied  that  do  not  have  a  subsequent  PCa 
diagnosis  or  that  have  indolent  disease.  Furthermore,  persistent  risk  factors  and  fear  of  missed  PCa  leads 
to  many  unnecessary  repeat  biopsies.  Most  prostate  tumors  have  epigenetic  DNA-methylation  aberrations, 
which  display  a  field  effect  that  can  be  observed  in  normal-appearing  surrounding  tissue,  and  that  could 
help  alleviate  biopsy-sampling  errors.  Methods:  A  training  cohort  of  methylation-positive  men  with  a 
negative  index  biopsy  followed  by  either  a  Gleason  score  (GS)  >  7  (n=43)  or  cancer-negative  (n=226)  repeat 
biopsy  was  evaluated.  Using  the  initial  negative  biopsy,  men  were  stratified  for  the  likelihood  of  harboring 
high-grade  PCa  focusing  on  a  methylation  intensity  algorithm  involving  GSTP1,  RASSF1  and  APC.  This 
algorithm  was  validated  in  a  cohort  of  102  men,  with  either  a  PCa-free  (n=20),  GS6  (n=46),  or  GSs7  (n=36) 
biopsies.  Results:  The  methylation  intensity-based  algorithm  was  developed  on  PCa-negative  index 
biopsies  and  optimized  to  predict  the  presence  of  GS>7  cancer  in  a  repeat  biopsy.  The  methylation 
intensity  was  significantly  higher  in  GS>7  compared  to  PCa-free  repeat  biopsies  (p<0.001).  Men  with  GS6 
PCa  detected  upon  repeat  biopsy  exhibited  intermediate  intensities.  When  combined  into  one  model  with 
clinical  risk  factors  (age,  pathology,  DRE,  PSA),  an  area  under  the  curve  (AUC)  of  0.762  was  obtained, 
which  was  significantly  higher  than  the  AUC  of  PSA  (0.574;  p=0.004)  or  the  AUC  of  the  clinical  risk  as 
calculated  by  the  PCPT  risk  calculator  (0.618;  p=0.029).  In  the  validation  set,  an  AUC  of  0.818  was 
obtained,  with  higher  intensities  for  men  with  GS>7  disease  compared  to  men  with  GS6  PCa  (p=0.002). 
Conclusions:  The  risk  score  can  identify  clinically  significant  cancer  in  PCa-negative  biopsies  and  is 
strongly  correlated  with  the  GS  of  PCa-positive  biopsies.  The  risk  score  could  better  stratify  men  for  the 
need  for  repeat  biopsy  and  the  risk  of  harboring  occult  clinically  significant  PCa.  The  same  algorithm  could 
be  used  to  segregate  likely  under-graded  men  from  active  surveillance  candidates. 
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EVALUATION  OF  RACIAL  DISPARITIES  ON  PROSTATE  CANCER 
DETECTION  ON  MRI/US  FUSION-GUIDED  PROSTATE  BIOPSIES 

Patrick  Guthrie  MD1.  Vidhush  Yarlagadda  MD1,  Jennifer  Gordetsky  MD1’2,  John 
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Birmingham,  Birmingham,  AL;  department  of  Pathology  and  Laboratory 
Medicine,  Tufts  Medical  Center,  Boston,  MA 

Objectives:  Significant  racial  disparities  exist  between  African  American  (AA) 
and  non-African  American  men  in  Gleason  Grade  at  the  time  of  prostate  cancer 
(PCa)  diagnosis.  To  better  characterize  this  disparity  we  used  multiparametric 
magnetic  resonance  imaging  (mpMRI)  and  targeted  biopsies  as  a  tool  to  assist  in 
PCa  detection. 

Methods:  Between  January  2014  and  August  2015,  177  patients  who  underwent 
mpMRI  and  MRI/ultrasound  (US)  fusion  guided  prostate  biopsy  and  concurrent 
12-core  biopsy  were  reviewed.  They  were  stratified  by  race  but  also  protocol 
entry  criteria:  (1)  prior  negative  prostate  biopsy,  (2)  active  surveillance  protocol, 
or  (3)  primary  biopsy  evaluation  for  abnormal  DRE  or  elevated  PSA.  MRI  studies 
with  T2-weighted,  diffusion  weighed,  and  dynamic  contrast  enhancement 
sequences  were  evaluated  and  areas  of  suspicion  were  identified.  Patients 
underwent  MRI/US  fusion  biopsies  of  targets  and  concurrent  standard  12-core 
biopsy.  The  number  of  targets  with  PCa,  number  of  standard  biopsies  with  PCa, 
grade  identified,  and  distribution  of  tumors  was  calculated. 

Results  Obtained:  In  our  study,  38  AA  males  and  139  non-AA  males  underwent 
MRI/US  fusion  biopsies.  PSA,  age,  and  cancer  detection  on  standard  biopsy  were 
not  significantly  different  between  groups.  AA  and  non-AA  men  had  a  mean  of 
2.58  and  2.74  targets  identified,  respectively(p=N.S).  The  efficacy  of  targeted 
biopsy  vs  standard  biopsy  in  detection  of  PCa  and  higher  grade  disease  was 
equivalent  between  AA  and  non-AA  males(p=N.S.).  When  both  targeted  cores 
and  standard  cores  found  PCa,  standard  cores  in  AA  males  showed  higher  grade 
PCa  than  targeted  cores  (p<0.001). 


Conclusions:  African  American  males  have  been  shown  to  have  higher  risk  of 
PCa  and  higher  grade  disease,  but  in  our  patient  cohort  undergoing  MRI/US 
fusion-guided  biopsy,  cancer  detection  stratified  by  grade  was  equivalent.  In 
patients  with  PCa  found  on  both  standard  and  targeted  biopsy  techniques,  AA 
patients  had  higher  grade  disease  on  standard  biopsy  cores,  likely  a  result  of  the 
distribution  of  A  A  patients  referred  with  already  diagnosed  PCa  on  AS, 
suggesting  a  selection  bias  favoring  the  posterior  peripheral  zone  location  of  their 
tumors. 
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Translational  Relevance 

Biomarkers  are  used  in  early  detection,  diagnosis,  prognosis  and  risk  assessment,  in  predicting  responses 
to  specific  therapies  and  in  evaluating  therapeutic/preventive  approaches  (surrogate  endpoints).  Before 
biomarkers  can  be  used  clinically,  they  must  be  validated;  however,  in  cancer  there  are  few  validated 
biomarkers  for  any  of  the  above  uses.  Validation  is  a  process  that  is  not  well  understood  by  investigators 
and  frequently  biomarkers  are  described  as  validated  when  they  have  only  begun  validation.  This 
manuscript  describes  and  discusses  a  well-defined  pathway  with  the  steps  that  are  necessary  for  validation 
of  a  biomarker  for  a  specific  use  over  a  defined  interval  of  time.  This  manuscript  will  aid  investigators  in 
the  validation  of  their  biomarkers,  will  clarify  approaches  needed  for  validation  and  will  reduce  the  waste 
of  resources  for  biomarkers  that  appear  to  be  not  strong  enough  to  be  validated  for  a  specific  use. 


2 


Validation  of  molecular  biomarkers 


Abstract 

Molecular  biomarkers  are  required  for  improving  the  assessment  of  risk  of  disease,  establishing  the 
existence  of  disease,  determining  prognosis  and  treatment,  and  the  implementing  personalized  medicine, 
and  their  clinical  validation  is  a  key  step  in  translational  medicine.  Although  many  published  papers  claim 
to  report  clinically  useful  prognostic  biomarkers,  there  are  embarrassingly  few  validated  cancer 
prognostic  biomarkers.  There  are  many  reasons  for  this  situation,  one  of  which  is  that  researchers  may  not 
fully  appreciate  the  subtleties  of  molecular  biomarkers  and  may  not  follow  the  rigorous  procedures  that 
are  necessary  to  translate  basic  scientific  findings  to  the  clinic.  We  propose  a  straightforward  approach  to 
validating  a  biomarker  using  a  well-defined,  three-stage  method.  The  stages  are:  1)  identification, 
characterization,  and  evaluation,  2)  data  and  model  testing,  and  3)  independent  prospective  replication  of 
results.  Also  discussed  are  several  important  issues  affecting  the  validation  of  biomarkers  such  as 
statistical  model  stability,  the  definition  of  clinical  events,  and  combining  molecular  biomarkers  into 
signatures  and  pathways.  The  goal  of  this  manuscript  is  to  clarify  the  process  of  validation  and  to  provide 
guidance  to  investigators  performing  translational  biomarker  research. 
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Introduction 

Molecular  biomarkers  are  required  for  improving  the  assessment  of  risk  of  disease,  establishing 
the  existence  of  disease,  determining  prognosis  and  treatment,  and  the  implementing  personalized 
medicine  (1-2),  and  their  clinical  validation  is  a  key  step  in  translational  medicine  (3-6).  Validation  is  a 
rigorous  process  that  requires  a  deep  understanding  of  molecular  biomarkers  and  their  relationship  to 
disease  and  an  appreciation  of  the  complexities  inherent  in  their  identification,  testing,  and  replication  (4). 

In  the  past  twenty  years  there  has  been  an  exponential  increase  in  molecular  biomarker  research 
with  thousands  of  new  gene  and  protein  biomarkers  reported  each  year.  At  last  count  there  are  over  five 
hundred  thousand  papers  indexed  in  PubMed  for  gene,  protein,  and  molecular  biomarkers.  Although 
many  of  these  papers  claim  to  report  clinically  useful  prognostic  biomarkers,  there  are  embarrassingly 
few  validated  cancer  prognostic  biomarkers  (7-10).  There  are  many  reasons  for  this  situation  (8-10),  one 
of  which  is  that  researchers  may  not  fully  appreciate  the  subtleties  of  molecular  biomarkers  and  may  not 
follow  the  rigorous  procedures  that  are  necessary  to  translate  basic  scientific  findings  to  the  clinic  (8-10). 
The  result  is  studies  replete  with  errors  and  a  literature  that  contains  incorrect,  and  many  times  even 
contradictory  results  (8-10).  Because  biomarkers  are  central  to  translational  medicine,  a  failure  to  properly 
understand,  assess,  and  utilize  them  has  prevented  their  use  in  treatment,  comparative  benefit  analyses, 
and  in  integrating  individualized  patient  outcomes  in  clinical  decision-making  (8-1 1). 

The  validation  of  molecular  biomarkers  has  been  a  concern  since  the  earliest  days  of  molecular 
research.  Over  the  last  twenty  years  significant  problems  have  been  noted,  and  recommendations 
regarding  solving  these  problems  have  been  made  (12),  but  few  of  these  proposals  have  been  adopted. 
Pepe  et  al.  (13)  proposed  a  model  for  clinical  validation  of  biomarkers  for  the  early  detection  for  disease, 
yet  subsequent  publications  on  early  detection  suggest  that  the  confusion  did  not  recede  after  this 
publication  (14-17). 

This  manuscript  proposes  a  straightforward,  general  method  for  validating  biomarkers  to  assist 
investigators  in  their  validation  of  molecular  biomarkers.  Because  of  the  inherent  complexity  in  analyzing 
biomarkers  and  the  dynamic  nature  of  the  field,  commentaries  and  general  guidelines  are  provided. 

Validation  of  Biomarkers 

A  molecular  biomarker  can  be  said  to  have  been  validated  if  it  has  been  shown  in  an  independent 
prospective  replication  study  to  reliably  and  accurately  predict  a  specific  outcome  in  a  specified  patient 
population  over  a  defined  time  interval  (1,3-4).  At  a  minimum,  a  validated  biomarker  consists  of  a  set  of 
necessary  and  sufficient  characteristics  that  uniquely  identify  the  biomarker  and  includes  the  following:  a 
detection  and  analysis  protocol  that  results  in  high  inter-laboratory  agreement,  a  defined  target  patient 
population,  a  trained  statistical  model,  i.e.,  a  model  whose  parameters  have  been  defined  by  the  data  that 
contains  the  biomarker,  other  relevant  factors  and  the  outcome  of  interest,  and  a  quantitative  statement  of 
the  accuracy  of  the  biomarker  at  predicting  the  outcome  of  interest  in  the  target  population  over  the 
specified  time  interval  ( 1 ,4). 

For  the  purposes  of  this  discussion,  “molecular”  refers  to  any  sub-cellular  factor,  including 
proteogenomic,  transcriptional,  and  metabolic  factors  (18).  “Biomarker”  refers  to  both  individual  and 
combinations  of  biological  factors,  including  panels,  patterns,  profiles,  pathways,  and  signatures  that  are 
used  to  predict  one  of  three  outcomes,  namely,  risk  of  disease,  the  existence  of  disease,  and  prognosis  (1). 
There  are  three  types  of  prognostic  biomarkers,  defined  in  terms  of  their  use,  namely,  natural  history, 
which  predicts  the  course  of  the  disease  if  the  patient  never  receives  a  therapy;  therapy-specific,  which 
predicts  whether  a  particular  therapy  will  benefit  the  patient;  and  post  therapy,  which  predicts  that  the 
therapy  the  patient  received  benefited  the  patient  (4).  “Outcome”  is  the  clinical  event  of  interest,  e.g., 
incident  disease,  response  to  therapy,  recurrence,  or  death.  Although  knowledge  of  the  biological  function 
of  a  molecular  biomarker  can  provide  important  basic  science  information,  functional  information  is  not 
necessary  for  using  a  biomarker  to  predict  a  clinical  outcome  (2). 
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Predictive  accuracy  refers  to  the  relationship  of  a  predicted  value  to  a  true  value  for  each  patient, 
across  a  population  of  patients.  It  has  two  components,  discrimination  (the  correct  ordering  of  the 
predictions)  and  calibration  (how  close  the  predicted  value  is  to  the  true  value).  A  very  useful  measure  of 
discriminative  accuracy  is  the  receiver  operating  characteristic  (ROC)  method  (19-21).  We  are  less 
interested  in  the  calibration  of  the  model  than  the  correct  ordering  of  the  predictions  because  poorly 
calibrated  models  can  be  corrected  by  performing  a  post-processor  calibration  (22)  but  there  can  be  no 
recovery  from  the  low  accuracy  of  a  model  that  poorly  discriminates. 

Three  Stages  of  Validation 

We  propose  three  stages  to  biomarker  validation:  1)  identification,  characterization,  and 
evaluation,  2)  data  and  model  testing,  and  3)  independent  prospective  replication  of  results  (Table  1)  (4). 
Each  stage  must  be  successfully  completed  before  moving  to  the  next  stage.  Prior  to  beginning  the 
clinical  validation  process  the  investigator  should  be  satisfied  that  the  biomarker  has  the  potential  to 
answer  an  important  clinical  question.  In  other  words,  does  the  biomarker  appear  to  be  related  to  the 
disease,  does  the  relationship  appear  to  be  very  strong,  and  could  use  of  the  biomarker  have  an  impact  on 
patient  outcomes? 

For  diseases  in  which  the  outcomes  are  easily  predicted,  no  additional  biomarkers  are  needed,  and 
for  diseases  where  there  is  no  effective  treatment,  biomarkers  will  have  little  clinical  utility.  Additionally, 
the  investigator  should  consider  whether  the  biomarker  is  suitable  for  clinical  use;  in  other  words,  is  it 
relatively  easily  acquired  and  analyzed,  is  the  analysis  reproducible  across  laboratories,  and  is  the 
acquisition  and  analysis  of  the  biomarker  relatively  inexpensive.  Further,  will  the  biomarker  be  applicable 
to  a  sufficiently  large  number  of  patients  so  that  its  validation  will  make  a  clinical  difference?  Finally,  the 
candidate  biomarker  should  be  examined  in  terms  of  whether  it  could  add  predictive  accuracy  when  used 
with  to  the  current  biomarkers  and  whether  it  could  eliminate  one  or  more  of  the  currently  used 
biomarkers.  If  it  neither  adds  predictive  accuracy  nor  eliminates  a  current  biomarker,  then  it  is  probably 
unnecessary.  If  the  researcher  believes  that  the  evidence  suggests  that  all  these  issues  will  be  resolved  in 
favor  of  the  biomarker,  then  the  validation  process  should  proceed. 

The  validation  of  molecular  biomarkers  should  progress  through  three  stages  (4).  Stage  1  has 
three  components:  identification  of  the  biomarker,  characterization  of  the  biomarker  in  terms  of  its 
specimen  acquisition  and  analysis  in  its  target  population,  and  creation  and  evaluation  of  a  multivariate 
supervised  learning  statistical  model  to  determine  the  predictive  power  of  the  biomarker  for  a  specific 
outcome  over  a  specified  time  interval. 

During  stage  1,  the  investigator  learns  about  the  biomarker,  including  assessing  the  practicality  of 
the  acquisition  of  the  biological  specimen,  its  accuracy  in  various  clinical  populations,  trying  different 
statistical  methods,  determining  a  threshold  (cut-off  point  for  a  continuous  variable)  for  the  biomarker, 
and  examining  the  effects  of  confounders  on  the  biomarker’s  accuracy.  Most  of  the  theoretical,  biological, 
and  experimental  work  related  to  the  clinical  validation  of  a  molecular  biomarker  occurs  in  the  first  stage 
and  the  determination  is  made  in  this  stage  as  to  whether  the  biomarker  is  sufficiently  accurate  so  that  is 
warrants  proceeding  to  the  next  stage  in  the  validation  process. 

Stage  2,  data  and  model  testing,  takes  the  final  results  of  the  first  stage  and  attempts  to  implement 
and  test  them  on  another  independent  dataset  from  a  different  institution.  This  is  an  important  stage 
because  it  reveals  many  of  the  unrecognized  assumptions  and  biases  that  existed  in  the  first  stage.  Stage  3, 
replication  of  results,  is  the  critical  stage  since  the  clinical  utility  of  the  biomarker  is  established  in  this 
stage.  Until  the  biomarker  successfully  completes  the  third  stage,  which  requires  an  independent 
investigator,  independent  laboratory,  and  independent  prospectively  collected  patient  population,  there  is 
insufficient  evidence  that  it  can  be  applied  to  an  important  clinical  problem. 

Stage  1,  Identification,  Characterization,  and  Evaluation  (ICE):  In  the  ICE  stage,  the 
investigator  selects  and  assesses  a  biomarker.  There  is  no  restriction  as  to  how  the  biomarker  is 
discovered.  One  of  the  first  steps  in  identifying  a  potential  risk  or  diagnostic  biomarker  is  to  determine  if 
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the  biomarker  is  expressed  differentially  in  diseased  versus  non-diseased  tissue,  in  other  words,  is  it 
specific  to  the  disease?  The  next  step  is  to  assess  the  biomarker’s  relationship  to  an  outcome,  i.e.,  risk  of 
disease,  diagnosis,  or  prognosis  (1).  Overall,  there  should  be  some  evidence  that  the  biomarker,  as 
measured  in  solid  tissue  or  in  bodily  fluids,  is  associated  with  the  clinical  outcome. 

At  the  completion  of  this  stage  the  biomarker  should  be  described  in  sufficient  detail  so  that  it  can 
be  unambiguously  and  reproducibly  identified  and  measured  by  other  investigators  (including  the 
acquisition,  storage  and  analysis  of  the  biological  specimen),  its  analysis  is  documented,  a  disease  is 
specified,  a  clinical  population  relevant  to  the  biomarker  is  identified,  a  disease-related  outcome  is 
selected,  and  the  time  interval  during  which  the  biomarker  is  relevant  to  the  outcome  is  provided. 

An  example  of  a  validated  biomarker  is  the  estrogen  receptor  (ER)  of  breast  cancer  (23).  ER  was 
initially  described  in  terms  of  its  measurement  by  radioimmunoassay,  the  specimen  of  malignant  tissue 
and  controls  in  which  it  was  measured,  the  method  of  data  analysis,  the  biomarker’s  relevance  to  a 
population  of  women  with  non-metastatic  breast  cancer,  its  disease  related  outcome,  e.g.,  mortality  due  to 
breast  cancer,  and  the  time  interval,  e.g.,  10  year  disease-specific  mortality,  and  better  survival  (24).  The 
presence  of  a  certain  level  of  ER  expression  in  the  tumor  predicts  that  anti-hormonal  therapy  will  be 
effective  in  reducing  women’s’  probability  of  a  recurrence  and  of  dying  from  their  breast  cancer  (23,25). 

Even  though  a  single  biomarker  may  be  the  primary  focus  of  the  validation,  its  clinical  use  will 
invariably  rely  on  a  multivariate  model  because  the  model  must  contain  all  the  predictively  relevant 
factors  so  that  it  can  make  accurate  predictions  (26-27).  The  goal  of  the  model  will  be  to  contain  all  the 
independent,  orthogonal  predictors  of  the  outcome.  Further,  the  multivariate  model  will  usually  be  related 
to  an  effective  treatment,  e.g.,  antihormonal  therapy  for  ER  expressing  breast  cancers,  so  that  the 
biomarker  predicts  which  patients  will  or  will  not  respond  to  a  specific  therapy  (28). 

An  initial  approach  to  the  analysis  is  to  create  a  dataset  containing  patients  to  be  analyzed  for  the 
biomarker  and  to  randomly  split  the  patients  into  training  and  testing  subsets.  The  reason  to  spilt  the  data 
set  is  because  the  model  developed  on  a  single  dataset  will  always  have  a  high  accuracy  when  it  is 
assessed  using  the  exact  same  patients  on  which  it  was  developed.  This  high  accuracy  is  due  to  over 
fitting  and  it  reduces  the  model’s  generalizability.  Therefore,  the  accuracy  of  the  model  should  be 
determined  on  another  dataset.  It  should  be  observed,  that  splitting  the  data  is  less  than  optimal  because 
the  training  and  testing  data  are  subsets  of  the  same  patient  population  and  contain  the  same  biases.  (We 
will  discuss  assessing  the  model’s  accuracy  independent  data  sets.)  The  training  subset  determines  the 
relationship  between  the  independent  and  dependent  variables  and  establishes  that  relationship  in  a 
statistical  model.  The  test  subset  measures  the  accuracy  of  that  trained  model.  For  large  data  sets  that 
contain  many  clinical  (binary)  events,  e.g.,  dead/alive,  recurrence/no  recurrence,  a  fifty-fifty  split  is 
reasonable.  For  smaller  data  sets  the  more  important  component  is  the  correct  modeling  of  the  disease 
phenomena,  so  more  data  is  allocated  to  the  training  subset  than  the  testing  subset.  A  useful  heuristic  for 
small  datasets  is  to  split  the  data  into  two-thirds  to  three-fourths  for  training  and  one-fourth  to  one-third 
for  testing  (28). 

The  biomarker  is  modeled  using  an  appropriate  statistical  method  on  the  training  dataset  and  its 
accuracy  is  tested  on  the  testing  dataset.  During  this  stage,  the  investigator  has  knowledge  of  each 
patient’s  outcome  and  may  examine  the  data,  assess  various  statistical  methods,  add  or  remove 
biomarkers,  and  modify  the  analysis  in  any  way.  Various  thresholds  can  be  tested  and  the  best  one 
selected.  There  are  no  limitations  on  what  may  be  done  with  the  data  or  how  the  results  are  analyzed 
during  this  stage. 

The  discriminative  accuracy  of  the  model  that  contains  the  biomarker  as  a  variable  is  measured 
on  the  testing  dataset  by  the  receiver  operating  characteristic  (ROC).  This  is  a  critical  juncture,  for  it  is 
here  that  investigators  can  take  a  wrong  turn.  There  is  an  inclination  to  believe  that  the  results  obtained  on 
the  ICE  testing  dataset  have  clinical  meaning,  but  they  do  not  because  the  investigator  has  optimized  the 
biomarker,  examined  and  manipulated  the  data  and  the  analysis,  looked  at  the  results,  and  through  trial 
and  error,  determined  the  best  threshold,  patient  population,  statistical  model,  and  outcome  for  the 
biomarker.  The  biomarker’s  accuracy  on  an  ICE  dataset  is  not  a  valid  measure  of  biomarker’s  clinical 
utility  because  this  stage  has  the  potential  to  produce  overly  optimistic  and  biased  results.  So  far  the 
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investigator  does  not  have  valid  results  and  neither  the  model  nor  its  ROC  developed  in  the  ICE  stage 
should  be  presented  or  published. 

How  to  report  studies  of  biomarkers  used  as  prognostic  factors  is  beyond  the  scope  of  this  paper, 
however,  other  publications  have  addressed  reporting  prognostic  biomarkers  including  REMARK,  a 
checklist  of  20  items  (truncated  to  1 1  items  by  some  journal  editors)  that  can  be  used  to  determine  if  a 
study  of  prognostic  factors  should  be  published  (27,29-30),  and  STROBE-ME  (3 1)  that  provides  guidance 
on  reporting  observational  molecular  epidemiology  studies. 

Focusing  on  an  understanding  the  scientific  process  regarding  biomarker  validation,  which  is  the 
goal  of  this  paper,  can  be  more  useful  to  an  investigator  than  performing  a  study  guided  by  whether  it  will 
meet  a  set  of  publication  criteria.  In  other  words,  although  problems  with  performing  a  study  and 
problems  with  its  publication  can  overlap,  if  the  performance  of  a  study  is  scientifically  valid  there  should 
be  few  reporting  problems,  whereas,  if  the  study  is  incorrectly  designed  and  performed  no  publication 
guidance  can  save  it. 

Further,  there  are  elements  in  some  reporting  approaches  that  may  increase  rather  than  decrease 
the  quality  and  validity  of  publications  on  biomarkers.  One  example  of  this  difficulty  is  the  statement  by 
Altman  (27)  that  it  is  permissible  to  publish  results  after  the  investigator  has  looked  at  the  data  and  used 
the  resulting  information  to  plan  key  features  of  the  analysis  to  be  perfomed  using  the  same  data.  Our 
view  is  that  data  can  only  be  looked  at  in  the  ICE  stage,  and  then  only  with  the  understanding  that  the 
resulting  ICE  finding  cannot  be  published. 

Typically,  the  accuracy  of  a  biomarker  decreases  as  it  progresses  through  the  validation  stages.  At 
the  end  of  the  validation  process  it  must  retain  sufficient  accuracy  to  be  clinically  useful.  In  other  words, 
the  accuracy  observed  in  the  ICE  stage  will  almost  always  be  higher  than  the  final  validated  accuracy  of 
the  biomarker.  In  order  to  save  a  biomarker  investigator  time  and  resources  we  suggest  the  following 
approach.  A  validated  biomarker  should  have  an  ROC  of  at  least  0.65  (assuming  a  standard  deviation  of 
0.05  or  less)  to  be  clinically  useful  (2,32).  Experience  suggests  that  a  biomarker  will  lose  between  0.3  and 
0.5  of  its  discriminative  accuracy  as  it  progresses  thorough  the  stages  of  validation.  Therefore,  the 
minimum  ROC  for  a  biomarker  to  move  from  the  ICE  stage  to  the  next  stage  should  be  0.75.  The 
relevance  of  these  numbers  will  become  more  apparent  in  the  clinical  utility  section  of  this  paper. 

Stage  2,  Data  and  Model  Testing  (DMT):  In  this  stage,  the  investigator  uses  the  final 
characterization  of  the  biomarker  and  statistical  model  derived  from  the  ICE  stage  to  test  the  biomarker. 
The  researcher  collects  a  new  independent  patient  dataset  (DMT  dataset)  from  another  investigator  at  a 
different  institution  (33).  It  includes  the  defined  target  patient  population  and  appropriate  biological 
samples  for  the  measurement  of  the  biomarker.  The  biomarker  characteristics  were  determined  based  on 
the  ICE  study.  The  investigator  then  tests  the  ICE’s  final  statistical  model  on  the  DMT  dataset  of  patients. 
The  critical  component  of  this  stage  is  the  proper  application  of  the  final  methods  and  results  from  the 
ICE  stage  to  the  new  DMT  patient  population.  The  trained  statistical  model  from  the  ICE  stage  can  be 
tested  only  once  on  the  DMT  dataset.  The  DMT  patients  are  run  though  the  predictive  model  and  the 
probability  the  outcome  over  the  defined  time  interval  for  each  patient  is  determined.  The  predicted 
outcomes  are  compared  to  the  true  outcomes  and  the  predictive  accuracy  of  the  model  is  determined  and 
reported  in  terms  of  the  model’s  ROC  on  the  DMT  patients.  The  results  must  be  sufficiently  accurate  to 
justify  moving  to  the  third  stage  of  the  validation  process.  In  this  case,  the  minimum  ROC  required  to 
progress  to  the  next  stage  is  0.70.  If  the  biomarker  does  not  achieve  an  acceptable  accuracy  in  the  DMT 
stage  the  investigator  should  determine  if  this  failure  was  due  to  one  of  the  following:  the  characterization 
and  analysis  of  the  biomarker,  the  statistical  model,  the  characteristics  of  the  patient  population,  the 
treatments  included  in  the  analysis,  the  conditions  of  the  study,  or  other  factors.  The  researcher  can  return 
to  the  ICE  stage  at  any  time,  improve  the  biomarker  or  the  model,  and  retest  once  on  the  DMT  dataset.  If 
the  researcher  uses  the  results  of  the  DMT  stage  to  improve  the  performance  of  the  biomarker,  then 
another  independent  patient  population  must  be  obtained  for  the  DMT  stage  (labeled  DMT2  dataset). 

A  successful  evaluation  of  the  biomarker  does  not  mean  that  it  has  been  validated  because  the 
same  investigator  who  performed  the  ICE  stage  also  performed  its  DMT  stage  and  the  dataset  was 
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retrospectively  created.  At  this  point  in  the  process  many  unknown  and  even  unanticipated  sources  of  bias 
may  exist  that  can  affect  the  power  of  the  biomarker  and  the  performance  of  the  model,  and  therefore  the 
accuracy  and  utility  of  the  biomarker.  For  example,  the  investigator’s  method  of  dataset  collection, 
biomarker  analysis,  and  use  of  the  statistical  model  are  all  subject  to  bias  and  error.  Further,  the  datasets 
used  in  the  first  two  stages  are  usually  retrospective  and  subject  to  all  the  biases  inherent  in  retrospective 
studies.  Positive  results  of  the  DMT  stage  may  be  reported,  but  the  report  should  contain  the  following 
explicit  statement,  “The  reported  biomarker  results  have  not  been  validated  and  the  biomarker  is  not  ready 
for  use  in  clinical  practice.”  It  is  important  that  negative  results  of  the  DMT  stage  also  be  published  (15). 

Stage  3,  Replication  of  Results  (ROR):  Because  the  hallmark  of  science  is  replication,  a 
different  investigator  with  a  prospective,  independently  collected  dataset  should  replicate  the  results  of  the 
DMT  stage.  This  process  is  similar  in  approach  to  the  DMT  stage.  The  final  model  from  the  ICE  stage, 
the  one  that  was  successfully  used  in  the  DMT  stage,  is  applied  only  once  to  the  ROR  dataset.  The  model 
makes  its  predictions  for  the  ROR  patients  and  these  predictions  are  compared  to  the  true  outcomes.  If  the 
ICE  stage  results  were  reproduced  in  the  DMT  stage  but  not  in  the  ROR  stage,  this  suggests  that  either 
there  was  a  bias  in  the  datasets  used  in  one  or  more  of  the  stages  of  the  validation  process  or  there  were 
problems  with  the  performance  of  the  biomarker  assay. 

Clinical  utility  means  that  the  biomarker  improves  the  management  and  outcomes  of  patients 
(30).  Determining  the  potential  clinical  utility  of  a  biomarker  is  a  complex  concept  (34).  It  includes,  but  is 
not  limited  to,  the  acquisition  and  analysis  of  the  biomarker,  the  number  of  patients  with  the  target 
disease,  the  severity  of  the  target  disease,  the  safety  and  efficacy  of  the  treatment,  and  the  accuracy  of  the 
test  in  predicting  a  therapy-specific  benefit.  Herein,  the  discussion  of  clinical  utility  is  limited  and  only 
includes  the  accuracy  of  the  biomarker.  A  necessary  requirement  for  clinical  utility  is  that  the  biomarker 
is  significantly  more  accurate  than  chance  prediction,  i.e.,  an  ROC  of  0.50;  thus  the  minimum  biomarker 
accuracy  of  an  ROC  of  0.65,  because  lower  accuracies  are  unlikely  to  surmount  chance.  It  should  be 
noted  that  accuracies  of  at  least  0.70,  and  a  standard  deviation  less  than  0.05,  are  preferred.  Increasing  a 
low  ROC  requires  either  starting  with  a  more  powerful  biomarker  or  reducing  the  variance  of  the 
predictions. 

Issues  Related  to  Validating  Biomarkers 

Biomarker  Datasets:  The  datasets  used  for  validation  should  include  the  current  clinical 
predictive  factors,  the  relevant  confounders,  and  the  effective  treatments.  They  should  have  a  sufficient 
number  of  patients  and  events  for  model  stability  (discussed  subsequently),  and  the  patients  should  be 
followed  for  a  sufficient  period  of  time,  defined  by  the  clinical  problem  the  biomarker  is  addressing,  so 
that  the  predictions  are  clinically  meaningful. 

Most  biomarker  studies  are  conducted  using  retrospective  populations.  These  datasets  have  the 
advantages  of  being  readily  available  with  relatively  long  periods  of  follow-up,  thus  making  them  quick 
and  much  less  expensive  to  acquire  and  use.  The  main  disadvantages  of  retrospective  data  sets  are:  1)  they 
may  contain  biases  associated  with  patient  selection,  or  specimen  acquisition  and  analysis,  or  treatment, 

2)  they  usually  do  not  contain  all  the  relevant  predictors  and  confounders,  i.e.,  there  can  be  unmeasured 
covariates,  3)  they  almost  always  contain  heterogeneous  patient  populations  and  therapies,  4)  not  all  the 
patients  may  have  been  assessed  for  the  candidate  biomarker  (i.e.,  appropriate  biological  samples  may  not 
be  available),  5)  the  therapies  are  not  uniformly  applied  across  patients  resulting  in  a  surprising  number  of 
different  treatment  regimens,  6)  they  may  contain  patients  treated  with  antiquated  therapies  and/or 
inadequate  numbers  of  patients  may  have  been  treated  with  current  therapies,  and  7)  they  typically 
contain  a  great  deal  of  missing  data  which  can  make  them  unsuitable  for  multivariate  analysis. 

A  key  issue  in  retrospective  data  is  the  absence  of  biomarker  values  in  some  of  the  patients.  The 
values  could  be  missing  completely  at  random  but  this  is  rarely  the  case  (35).  Usually  a  bias  is  at  work. 

The  investigator  has  a  number  of  ways  to  deal  with  this  problem,  including  only  using  the  patients  who 
have  a  biomarker  value,  imputing  a  central  tendency  biomarker  value,  or  finding  the  specimens  and 
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assessing  the  missing  biomarker  value.  In  terms  of  solving  the  missing  data  problem,  finding  the 
specimens  and  assessing  the  biomarker  is  usually  the  best  approach.  If  this  is  not  possible,  then 
performing  multiple  imputation  may  be  a  useful  alternative  approach  (36).  In  any  event,  if  the  missing 
biomarker  values  affect  the  validation  results,  for  example,  if  there  is  an  important  bias  at  work  in  the 
data,  it  will  be  discovered  in  Stage  3  by  a  significant  decrement  in  accuracy.  Thus,  retrospective  studies 
are  not  definitive  evidence  of  the  accuracy  and  clinical  utility  of  a  biomarker. 

Prospectively  collected  populations  avoid  many  of  the  weaknesses  inherent  to  retrospective 
studies.  A  prospective  study  follows  a  defined  population,  it  collects  all  the  relevant  variables  and 
samples,  it  implements  uniform  biomarker  detection  methods  and  therapy  regimens,  and  the  patients  have 
been  followed  for  a  pre-specified  period  of  time.  The  major  limitations  of  prospective  studies  are:  they 
have  entry  criteria  that  create  a  relatively  homogeneous  patient  sub-population  (in  part  to  reduce  patient 
variability),  they  require  extensive  financial  and  manpower  resources,  and  they  take  a  long  time  to 
complete.  Further,  they  may  not  be  generalizable  to  most  patients  with  the  disease  because  the  patients  in 
the  study  were  a  special  sub-population,  because  of  the  study’s  tight  clinical  control,  and  because  many 
patients  in  the  real  world  will  not  receive  the  exact  therapy  offered  in  the  trial.  Due  of  the  time  and  cost  of 
prospective  studies,  retrospective  studes  are  usually  employed  in  the  ICE  and  DMT  stages,  the  results  of 
which  are  used  to  justify  the  time  and  cost  of  a  prospective  replication  study. 

Implicit  in  this  discussion  is  the  knowledge  that  prospective  datasets  are  usually  collected  to 
evaluate  a  specific  therapy.  Their  use  in  the  ROR  stage  is  based  on  the  idea  that  not  all  the  patients  who 
receive  the  therapy  will  respond  and  that  this  differential  clinical  effect  can  be  used  to  define  the  utility  of 
the  biomarker  in  predicting  which  patients  will  respond  to  the  therapy  (therapy-specific  prognosis)  and 
predicting  which  patients,  after  receiving  the  therapy,  responded  to  it  by  a  change  in  the  biomarker  value 
(post-therapy  prognosis)  (1). 

Statistical  Model  Instability:  An  important  consideration  in  building  statistical  models  is  to 
avoid  model  instability.  Model  instability  occurs  when  the  relationship  between  the  independent  variable 
and  the  dependent  variable  is  not  linked  strongly  enough  in  the  model.  The  result  is  that  the  model’s 
parameter  estimates  vary  over  too  great  a  range.  It  has  been  suggested  that  to  avoid  model  instability  there 
must  be  at  least  10  events  (defined  subsequently)  for  each  independent  variable  (37),  however,  for  the 
analysis  of  the  predictive  power  of  molecular  biomarkers  15-20  events  provide  a  greater  assurance  of 
model  stability.  With  this  number  of  events,  the  relationship  between  each  independent  variable  and  the 
outcome  can  be  reliably  determined  (to  the  extent  that  the  independent  variable  is  a  strong  predictor  of  the 
outcome).  Alternatively,  one  can  use  the  bootstrap  method  to  test  for  model  instability  (38-39). 

Clinical  events'.  A  clinical  event  is  defined  as  the  least  frequent  clinical  outcome  (4).  Thus,  for  a 
binary  outcome,  e.g.,  alive  or  dead,  whichever  occurs  least  often  is  the  event  rate.  The  optimal  event  rate 
for  the  analysis  of  a  binary  outcome  is  50%.  As  the  event  rate  diverges  from  50%  toward  0%  or  100%  it 
becomes  easier  to  make  predictions  because  a  model  will  predict  that  the  more  frequent  event  will  always 
occur  and  it  will  be  correct  more  and  more  of  the  time.  For  example,  in  terms  of  percent  correct,  if  the 
event  rate  is  10%  the  model  will  be  correct  90%  of  the  time  if  it  always  predicts  the  occurrence  of  the 
non-event.  In  other  words,  statistical  models  can  learn  to  ignore  the  independent  variables  and  “bet  on  the 
frequency”  (4).  In  fact,  in  clinical  conditions  with  very  low  event  rates,  it  is  rarely  possible  for  the 
independent  variables  to  do  as  well  as  predicting  the  outcome  as  betting  on  the  frequency.  This  illustrates 
why  an  analysis  cannot  be  based  on  an  accuracy  measure  such  as  percent  correct.  The  ROC  adjusts  for  the 
event  frequency. 

Combining  molecular  biomarkers :  Although  a  detailed  discussion  of  the  acquisition  and  analysis 
of  molecular  factors  for  purposed  other  than  biomarker  validation  is  beyond  the  scope  of  this  paper  (40- 
41),  there  are  certain  issues  related  to  the  validation  of  these  biomarkers  that  must  be  addressed.  One  can 
combine  molecular  biomarkers  under  various  rubrics  including  panels,  patterns,  profiles,  signatures,  and 
pathways.  The  goal  of  combining  biomarkers  is  usually  to  increase  predictive  power  beyond  that  afforded 
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by  an  individual  molecular  biomarker.  There  are  at  least  two  approaches  to  combining  molecular 
biomarkers.  One  approach  can  be  called  “naive”  because  it  groups  biomarkers  using  a  statistical 
algorithm  that  does  not  use  any  previously  known  information  regarding  the  biomarkers  or  the 
relationships  between  the  biomarkers.  The  idea  is  that  the  relationship  between  the  biomarkers  will 
become  apparent  through  their  statistical  association. 

Another  approach,  which  can  be  called  “functional,”  captures  the  power  inherent  in  functionally 
related  biomarkers,  such  as  membership  in  a  biological  pathway  that  is  related  to  the  disease  process.  The 
idea  is  that  there  is  prior  scientific  knowledge  that  certain  biomarkers  are  related  to  each  other  and  to  the 
disease  and  this  previous  knowledge  can  be  used  to  inform  the  statistical  model.  In  other  words,  naive 
groupings  make  use  the  numerical  information  in  the  dataset  but  they  ignore  all  other  information, 
whereas  functional  groupings  use  the  numerical  information  in  the  dataset  and,  in  addition,  they  take 
advantage  of  previous  scientific  knowledge  regarding  relations  among  biomarkers. 

A  functional  group  can  be  any  set  of  related  biomarkers.  There  is  no  restriction  on  the  composition 
of  a  functional  group  other  than  it  must  consist  of  factors  that  are  related  to  the  disease  process.  The  idea 
is  that  a  subset  of  the  pathway  factors  will  be  active  at  any  one  time  and  thus  are  predictive  of  the  course 
of  the  disease.  Further,  one  would  like  to  include  multiple  orthogonal  pathways,  i.e.,  each  providing  new 
information  regarding  the  disease  process,  in  order  to  further  increase  predictive  power.  Because  of  the 
multiplicity  of  molecular  biomarkers  that  comprise  a  pathway,  the  functional  approach  can  be  an  effective 
way  to  combine  many  related  biomarkers.  The  biomarkers  in  the  pathway  can  be  integrated  using  partial 
least  squares,  principal  components,  or  similar  dimension-reduction  strategies  and  the  integrated 
biomarkers  can  be  one  variable  in  the  multivariate  statistical  model.  Thus,  each  orthogonal  pathway  can 
be  represented  as  a  variable  in  a  multivariate  model.  Generally,  functional  groups,  rather  than  individual 
biomarkers,  have  the  greatest  chance  of  being  strongly  predictive. 

Sometimes  investigators  create  a  group  of  genes,  e.g.,  1 0  genes,  and  claim  that  this  is  a  unitary  gene 
signature  and  the  genes  are  necessary  and  sufficient  to  be  the  signature  for  some  outcome.  But  when  the 
researchers  attempt  to  reproduce  their  finding  in  another  study,  instead  of  reproducing  the  entire  group  of 
significant  genes,  they  find  that  only  6  of  the  genes  are  significantly  associated  with  the  outcome  in  the 
repeat  study.  They  may  wish  to  claim  that  the  6  genes  are  now  the  validated  gene  signature.  The  problem 
is  that  the  researchers  cannot  claim  that  the  signature  is  composed  of  10  genes  when  only  6  of  the 
signature  genes  can  be  reproduced,  nor  can  they  claim  that  the  combination  of  6  genes  is  a  new,  replicated 
signature.  Clearly,  one  cannot  have  it  both  ways,  one  cannot  claim  that  there  is  a  validated  gene  pattern 
when  the  pattern  does  not  replicate  completely  or  abandon  the  pattern  for  another  pattern,  yet  claim  that 
the  original  signature  was  replicated.  On  the  other  hand,  in  a  functional  group,  when  one  claims  that  a 
related  set  of  genes  is  the  predictive  unit  of  analysis,  it  is  not  expected  that  all  the  genes  in  the  group  will 
always  be  significantly  over  or  under  expressed. 

One  method  for  assessing  the  predictive  power  of  a  biomarker  in  a  multivariate  model  is  to  remove 
the  biomarker  from  the  model  and  observe  a  change  in  predictive  accuracy  (28).  In  this  approach  each 
variable  is,  in  turn,  removed,  assessed,  and  returned  to  the  model.  The  idea  is  that  if  the  biomarker  is  a 
powerful  predictor  a  large  decrement  in  accuracy  will  be  observed  when  it  is  removed.  It  should  be  noted 
that  this  is  a  complex  process  since  it  also  involves  issues  related  to  collinearly  and  levels  of  analysis. 
Analysis  levels  refer  to  the  type  of  units  being  analyzed.  For  example,  one  can  posit  three  levels  of 
analysis  in  cancer,  namely,  epidemiologic,  e.g.,  age,  race,  etc.,  anatomic  and  cellular,  e.g.,  tumor  size, 
histology,  etc.,  and  molecular-genetic,  e.g.,  ER,  PR,  HER-2  (18). 

Time  denomination  of  the  biomarker.  Predictions,  i.e.,  the  probability  of  the  occurrence  or  non¬ 
occurrence  of  an  event,  must  always  be  time  denominated  (1,4).  For  example,  the  probability  of  an  event 
occurring  in  five  years  is  different  than  the  probability  of  that  same  event  occurring  in  ten  years.  There  are 
two  reasons  why  the  prediction’s  duration  must  accompany  its  numerical  estimate.  First,  time  itself 
affects  the  probably  of  the  outcome.  For  example,  it  may  be  more  difficult  to  make  predictions  in  the 
middle  of  the  time  interval  (where  the  interval  is  bounded  by  the  index  date  and  the  end  of  the  study). 
Second,  the  biomarker  may  only  be  related  to  the  disease  (“active”)  at  a  particular  time  in  the  disease 
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process,  rather  than  uniformly  across  the  course  of  the  disease.  Thus,  a  biomarker  may  be  useful  in 
predicting  an  outcome  at  2  years  but  not  useful  in  predicting  the  same  outcome  at  10  years.  In  other 
words,  when  a  biomarker  makes  a  prediction,  that  prediction  is  only  relevant  for  a  defined  population,  a 
specific  outcome,  and  over  a  specified  period  of  time.  Finally,  lifetime  predictions  are  rarely  clinically 
useful  because  it  is  not  clear  what  the  duration  of  the  patient’s  lifetime  will  be,  therefore,  the  time  interval 
of  the  prediction  is  unknown. 

Conclusion 

If  we  are  going  to  model  diseases  in  terms  of  their  molecular  characteristics,  and  these  models  are 
going  to  drive  future  advances  in  medical  care,  then  translational  science  must  produce  clinically 
validated  molecular  biomarkers.  Unfortunately,  molecular  biomarkers  are  subtle  and  complex  entities, 
and  their  validation  is  challenging.  Advances  in  the  validation  of  clinically  useful  biomarkers  requires  an 
unambiguous  scientific  nomenclature,  clearly  described  and  defined  methods,  and  clinically  relevant  uses 
if  the  molecular  biomarkers  are  to  significantly  impact  medical  care.  To  minimize  the  reporting  and  use  of 
biomarkers  that  cannot  be  validated  a  straightforward  three-stage  approach  to  biomarker  validation  is 
described.  The  three  stages  are:  1)  biomarker  identification,  characterization  and  evaluation,  2)  data  and 
model  testing,  and  3)  replication  of  results.  This  provides  a  scientific  approach  that,  if  followed,  offers  a 
high  degree  of  certainty  that  a  validated  biomarker  will  be  a  true  and  clinically  useful  predictor  of  disease- 
related  outcomes. 
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Table  1.  Characteristics  of  validation 


Category 

Stage  1 

Stage  3 

Investigator 

Original 

Original 

Independent 

Data 

Retrospective  or 

prospective 

Different  retrospective 
or  prospective 

Different  prospective 

Analysis 

Any 

Pre-specifted 

Pre-specified 

Minimum  Model 
Accuracy* 

0.75 

0.70 

0.65 

Reporting  results 

No 

Qualified 

Unqualified 

*  Minimum  model  accuracy  is  the  discriminative  accuracy  of  the  statistical  model  that  includes  the 
biomarker  and  the  biomarker  adds  significant  predictive  accuracy  to  the  model. 
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